nagios latency
Marcello Russo
markel at tin.it
Thu Jul 28 09:49:51 CEST 2005
Hi
we work for a service provider in italy, and we use nagios for
monitoring many platforms in our ceds.
During last week we had a problem, an unexpected blackout with (!)
the lacked activation of the electricity-generating group: 600 not
running server!
The Nagios performance, during the black-out, decreased drastically:
the latency check at the end of the day arrived at 4 hours!
In this week we look in the code, and we've see that the service
check and the host check, even if they stay in the same lists (low
and high priority) have a separate method check:
the services can run in parallel, but not the host.
In the file checks.c the function run_service_check permit the
multiple execution of the scheduled services, which is not
implemented in the run_host_check.
Why you use a serial method for the host check?
If is possible and if you want, we can work together for a solution.
p.s.
now we use check_fping intstead of check_ping (we even try with this
settings: warning 0,400 sec. critical 0,800 sec. timeout 1, 2 packets
with poor performance) for maximize the performance (in case of
problems...), but this isn't a definitive solution, because when all
the server are down, the latency arrives quickly at 40 minutes!
Thanks
Marcello, Andrea, Lorenzo.
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
More information about the Developers
mailing list