How to reduce a very high latency number
Tedman Eng
teng at dataway.com
Thu May 18 20:25:06 CEST 2006
Try tuning the intercheck_delay_method setting. This setting determines the
initial spreading out of the checks in the queue during a fresh start.
Nagios tries to do a good job of this, but if you have some checks spaced at
vastly different intervals, it skews the "flat average" formula used to
calculate the smart setting.
Simple example:
Check_A - every 1 minute
Check_B - every 5 minutes
Total Checks: 2
Nagios would pick an intercheck delay of 1.5 minutes. It averages the check
times, divides by the total checks.
(average check time) / (total checks)
((1+5)/2) / 2 = 1.5
However, once every 5 minutes, you actually need to run Check_A and Check_B
during the same minute, but Nagios would wait 1.5 minutes between each
check, resulting in .5 minutes of latency for Check_A at best, 2 minutes of
latency for Check_A at worst.
To solve this, recalculate your check intercheck_delay using a manually
calculated formula, substituting the shortest check interval, divided by the
total checks.
(shortest check) / (total checks)
1 / 2 = .5
Think of intercheck delay as the "gap" that nagios uses between checks as
they are added to the queue. It won't schedule things before it's time to,
so Check_B will still wait 5 minutes before being put into the check queue.
The only difference is that there'll only be a .5 minute "gap" before
executing Check_A afterwards.
NOTE: If you have some extremely short-interval checks, they can skew the
average in the other direction, so if you use this technique, be aware of
the CPU load implications is causes on your monitoring server.
> -----Original Message-----
> From: Trask [mailto:trasko at gmail.com]
> Sent: Wednesday, May 17, 2006 5:26 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] How to reduce a very high latency number
>
>
> > I've noticed we get this problem when there are more than
> one or two hosts
> > down. Because Nagios (we use 1.2) does host checks first,
> and sequentially,
> > a host check timing out can hold up everything else (we
> have >3000 checks to
> > run every 5 minutes).
> >
>
> I have no hosts down 95% of the time, including now. I could see how
> that would be an issue, though.
>
> I have turned off all logging, state retention, performance data
> handling and backed off all timing parameters to their defaults (or
> even less aggressive timings). In a separate test, I changed only the
> command_check_interval from -1 (check as often as possible) to 10
> seconds. Both have had seemingly no effect. At this point, they 2
> main servers I am looking at have been running for 30 minutes and
> latencies are up to 540 seconds for the "bad" one and 48 sec for the
> other one.
>
>
> My next step will be to recompile with the latest nagios and try that.
> If that doesn't show an improvement, I'll try w/o perlcache. Lastly,
> I'll try without the embedded perl interpretter at all.
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web
> services, security?
> Get stuff done quickly with pre-integrated technology to make
> your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on
> Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&
dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list