passive check expire race condition
Michelle Craft
craft at cs.wisc.edu
Tue Jul 31 17:53:34 CEST 2007
[1185891648] SERVICE ALERT: emperor20.cs.wisc.edu;what;OK;HARD;1;OK: Script ran.
[1185895333] Warning: The results of service 'what' on host 'emperor20.cs.wisc.edu' are stale by 10 seconds (threshold=3700 seconds). I'm forcing an immediate check of the service.
[1185895335] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;emperor20.cs.wisc.edu;what;0;OK: Script ran.
[1185895343] SERVICE ALERT: emperor20.cs.wisc.edu;what;CRITICAL;HARD;1;CRITICAL: Test failed. Passive check didn't send info.
It looks like, once the stale condition is noticed, it about takes 10
seconds to run the alternate active/fail check. If a passive check comes
through in that time setting the state to OK, the fail check overrides it.
Is there a way to make the forced check verify that a check hasn't come
through in the meantime? Or to put a semaphore on the check so that the
new passive check isn't processed until the forced check completes?
--
Michelle
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
More information about the Developers
mailing list