How to reduce a very high latency number
Miroslaw Horbal
miroslaw at gmail.com
Wed Dec 10 18:10:09 CET 2008
Have you tried modifying the OCSP command? More specifically, have you
optimized the command to have the lowest possible runtime. I noticed
that only one instance of the OCSP command is executed at a time and
this can lead to very high latencies when a large number of checks are
queued for submission.
In our environment the OCSP command took approximately 0.7seconds to
run - so with 100 checks there would be approximately 70seconds where
nagios is submitting results. It's pretty easy to see how this can
lead to performance issues as the number of checks increases.
Our OCSP command is submit_check_results is very similar to the one
shown in the nagios docs
(http://nagios.sourceforge.net/docs/3_0/distributed.html).
Notice the last line:
'''
/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" |
/usr/local/nagios/bin/send_nsca -H central_server -c
/usr/local/nagios/etc/send_nsca.cfg
'''
In this version, the submit_check_result script has to wait for the
send_nsca command to execute before the script can exit. If send_nsca
takes 0.6seconds to send the check results and receive confirmation
that the packet has been sent then the submit_check_result script has
to wait at least 0.6seconds before exiting.
Now consider this modified line:
'''
/bin/printf "%s\t%s\t%s\t%s\n" "$1" "$2" "$return_code" "$4" |
/usr/local/nagios/bin/send_nsca -H central_server -c
/usr/local/nagios/etc/send_nsca.cfg > /dev/null &
'''
In this version, submit_check_result will run send_nsca in the
background (similar to forking the process). This time
submit_check_result will only execute send_nsca, but it will not wait
for send_nsca to finish running. This effectivly reduces the runtime
of submit_check_result to about 0.02seconds which is a huge
improvement compared to 0.7seconds.
Now lets look at the numbers:
Before I modified submit_check_results our average service check
latency was 160seconds. After modifying the script latency dropped
down to 1second.
Hope this helps,
beegie_b
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list