Passive services going stale on a Nagios restart
Ton Voon
ton.voon at altinity.com
Mon Feb 13 16:05:36 CET 2006
Hi Ethan,
I'm running a distributed monitoring setup with freshness checking on
the master server for passive checks. If the master is stopped for a
long time and then restarted, the passive checks go stale at the next
freshness cycle because there is not enough time for the slaves to
send results back.
In base/checks.c, there is some code to cater for program_start, but
is only for active checks. I've removed the active_check condition
and this works for me now.
This is the patch:
--- checks.c.2.0 2006-02-13 11:57:09.181245510 +0000
+++ checks.c 2006-02-13 12:00:02.726750637 +0000
@@ -1758,7 +1758,9 @@
/* calculate expiration time */
/* CHANGED 11/10/05 EG - program start is only used
in expiration time calculation if > last check AND active checks are
enabled, so active checks can become stale immediately upon program
startup */
- if(temp_service->has_been_checked==FALSE ||
(temp_service->checks_enabled==TRUE && program_start>temp_service-
>last_check))
+ /* if(temp_service->has_been_checked==FALSE ||
(temp_service->checks_enabled==TRUE && program_start>temp_service-
>last_check)) */
+ /* Passive checks immediately go stale, so ignore the
checks_enabled setting */
+ if(temp_service->has_been_checked==FALSE ||
program_start>temp_service->last_check)
expiration_time=(time_t)(program_start
+freshness_threshold);
else
expiration_time=(time_t)(temp_service-
>last_check+freshness_threshold);
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
More information about the Developers
mailing list