Nagios service latency

Andreas Ericsson ae at op5.se
Mon Nov 5 10:44:44 CET 2007


Thomas Guyot-Sionnest wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 04/11/07 12:55 PM, Andreas Ericsson wrote:
>> That's not strictly true. Each check run by Nagios needs three fork()'s
>> and two exec()'s. For your case, you can achieve exactly the same thing,
>> but with 6 fork()'s and 4 exec()'s less by using
>>
>> check_host -H $HOSTADDRESS$ -w <warn> -c <crit> -n 3
>>
>> except that check_host will be a lot faster for the cases where the host
>> has turned unreachable due to routing problems.
>>
>> OTOH, it'll be less work for you *not* to change it, so so long as that
>> works.
> 
> I agree, but with Nagios, check_latency is usually *not* caused by
> system resources usage but rather by time spent waiting for hosts checks.
> 
> In other words I do *not* care about system resources utilization if
> Nagios can't use them up which is currently the case. The way I do hosts
> checks is IMHO the fastest one from Nagios point of view, despite the
> fact that it can take more CPU cycles to get performed. My check send
> only one ICMP ping in most cases. If it's isn't answered it will retry
> up to two mote times (3 * 500 ms timeout max). That is tunable of
> course. Also I believe the packet interval in your test makes it take
> actually longer for the same number of unanswered packet with the same
> timeout.
> 

Set it lower then ;-)

> For Nagios 3 with host check caching your method is probably the best
> one though.
> 
> 
> BTW, this is probably tunable but for the records:
> 
> $ time bash -c './check_host -H 1.1.1.1 -w 300.00,80% -c 500.00,100% -n 3'
> CRITICAL - 1.1.1.1: rta nan, lost 100%|rta=0.000ms;300.000;500.000;0;
> pl=100%;80;100;;
> 
> real    0m4.517s
> user    0m0.004s
> sys     0m0.004s
> 
> $ time bash -c 'for i in 1 2 3; do ./check_icmp -H 1.1.1.1 -w 300.00,80%
> - -c 500.00,100% -n 1 -t 10; done'
> CRITICAL - 1.1.1.1: rta nan, lost 100%|rta=0.000ms;300.000;500.000;0;
> pl=100%;80;100;;
> CRITICAL - 1.1.1.1: rta nan, lost 100%|rta=0.000ms;300.000;500.000;0;
> pl=100%;80;100;;
> CRITICAL - 1.1.1.1: rta nan, lost 100%|rta=0.000ms;300.000;500.000;0;
> pl=100%;80;100;;
> 
> real    0m2.830s
> user    0m0.008s
> sys     0m0.000s
> 

Add another ~0.5 seconds on top of that for Nagios to setup macros, build
command-line, fork(), run the actual checks and reap the results.

> Overall it's still faster to run 3 of my commands than one of yours on
> an unreachable host.
> 

time /opt/plugins/check_host -H 1.1.1.1 -w 300.0,80% -c 500.0,100% -n 3 -i 50ms
CRITICAL - 1.1.1.1: rta nan, lost 100%|rta=0.000ms;300.000;500.000;0; pl=100%;80;100;; 

real    0m2.154s
user    0m0.001s
sys     0m0.000s

I win ;-)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list