Race condition in freshness checking
Ton Voon
ton.voon at altinity.com
Mon Sep 24 19:56:36 CEST 2007
Hi!
We found a bug in the calculation of the latency for a passive check.
This has highlighted a possible race condition re: freshness
checking. We wanted to get some ideas on what is the best approach to
fix this.
Background:
We have a master/slave arrangement, with freshness checking
(freshness_threshold=0) of slave services on the master.
Looking in the NDO db, we realised that the latency values for
passive results were incorrectly calculate - sometimes latency values
could be 1000x out. The patch is attached. However, since using this
patch, we've seen occasional race conditions.
Problem:
Within checks.c:check_service_result_freshness, if a service has past
its expiration_time, it is marked as is_being_freshened and a forced
service check is scheduled. However, if a passive result for this
service is processed before this forced check is run, then the
service is marked as stale and the state is inconsistent between
master and slave.
Possible solutions:
- If a check result is processed with is_being_freshened set for
the service, then remove forced check from schedule if it exists.
- Change is_being_freshened to stale_time (0 if not stale). On
running the forced check, if stale_time is less than last_check_time
(+ latency?), break out of running the forced check.
None of these sound particularly appealing to us. Are there other
possible solutions? Any opinions?
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20070924/1d1579c6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios_corrected_latency_for_passive_results.patch
Type: application/octet-stream
Size: 838 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20070924/1d1579c6/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20070924/1d1579c6/attachment-0001.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel
More information about the Developers
mailing list