Problem with high latencies after going distributed
Frost, Mark {PBG}
mark.frost1 at pepsi.com
Wed Jan 23 03:13:43 CET 2008
>-----Original Message-----
>From: Steve Shipway [mailto:s.shipway at auckland.ac.nz]
>Sent: Tuesday, January 22, 2008 8:45 PM
>To: Frost, Mark {PBG}; Nagios Users
>Subject: RE: [Nagios-users] Problem with high latencies after
>going distributed
>
>> As I'd mentioned in a previous message, I'm in the process of
>converting
>> from a centralized
>> Nagios 2.10 setup all running on a single host to a distributed setup
>> running on at least 3
>> hosts (3 to start anyway). The centralized setup has 572 hosts and
>2900
>> services 99.9% of which are active checks.
>...
>> Active Service Latency: 0.000 / 7267.198 /
>> 4241.019 sec
>
>This isn't much help, but...
>
>We've just done exactly the same (Nagios 2.9), and we have a comparable
>size of system (actually a bit larger - 713 hosts, 5834 services).
>After going distributed, we too have this insanely high latency on the
>satellites.
>
>The only possible cause is the OCSP command slowing things
>down somehow.
>This is using the supplied send_nsca call to send the status off to the
>central server...
>
>define command {
> command_name relay
> command_line $USER1$/submit_check_result "$HOSTNAME$"
>"$SERVICEDESC$" "$SERVICESTATEID$" "$SERVICEOUTPUT$"
>}
>
>So it should work. I guess things would be better if it packaged the
>updates up into batches, although it cant do that normally.
>
>I think it might be better to make the OCSP command just dump
>the status
>to a file, and then have a cronjob every 60 seconds that reads the file
>and sends the statuses off as a batch. I will try this here,
>when I get
>the chance.
>
>Steve
But if the submit_check_result is running slowly, that would only affect
the service
execution time wouldn't it? My understanding of check latency is that
it's the difference
in time between when Nagios schedules a check to run versus the time
that the check
actually starts to execute.
But maybe I'm misunderstanding something here. When it comes to working
with Nagios, I
tend to learn the most when I have the biggest problems :-).
Do you do the same thing I mentioned where you define all the checks on
both distributed
nodes, but disable checks on complimentary halves of those checks on
each node?
Thanks
Mark
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list