Fix for host dependency checks
Holger Weiss
holger at CIS.FU-Berlin.DE
Wed Mar 22 03:30:35 CET 2006
* Ethan Galstad <nagios at nagios.org> [2006-03-21 19:17]:
> On 22 Mar 2006 at 1:48, Holger Weiss wrote:
> > * Ethan Galstad <nagios at nagios.org> [2006-03-21 12:50]:
> > > I'll keep this on the TODO list for Nagios 3.x, but I think it might
> > > require some more thought. The last hard state of the host should
> > > only be used in the dependency logic if a state change occurred
> > > relatively recently. If, for example, the last hard state change
> > > occurred two days ago, you don't want that value used in the logic.
> >
> > Okay, but the current Nagios code uses _only_ the last hard state (no
> > matter how "old" it is), which is the reason why I've encountered the
> > problem in the first place. I thought about checking the freshness of
> > the last hard state myself (the information is available in the host
> > struct already, so this would be easy), but then I omitted that since
> > letting the dependency fail if either the current or the last hard
> > state matches the criteria seemed sufficiently safe to me. This way,
> > "false alarms" for the (dependent) host B should reliably be
> > prevented, while the risk of suppressing legitimate notifications for
> > B because the dependency fails due to an outdated last hard state of A
> > is the same as with the current Nagios code. I believe that in
> > practice, this risk is very low: I suppose that in almost all cases,
> > the configured dependency criteria will be a down and/or unreachable
> > state. So the risk would be that an outdated down or unreachable
> > state lets the dependency fail, but down and unreachable states should
> > normally be more or less up-to-date.
>
> Aha - I think we're using different terms. :-) The nagios 2.x code
> uses host->current_state in the dependency logic, but that's not
> necessarily "current" in terms of time.
Yes, that's what I meant. The 2.x code simply uses host->current_state.
My patch forces a new check of host A during the dependency check for B.
After this new host check was performed, the host->current_state value
used by the 2.x code is available as host->last_hard_state. My patch
then checks this host->last_hard_state value just as the 2.x code does
and additionally checks the now updated host->current_state.
> I made some major overhauls to the host check logic in the Nagios 3.x
> CVS code.
Ah, sorry, I must admit that I didn't find the time to look at the new
code yet---I'll do that really soon now[tm]! :-) Okay, forget about my
patch (apart from maybe as a bugfix for the 2.x branch) ;-)
> Those changes include parallel host checks and "predictive dependency
> checks". The predictive checks idea came from your earlier suggestion
> that all hosts that are depended upon for notification be checked
> before the notification gets sent out.
>
> Here's how the Nagios 3.x code does this... On the second to the last
> max host check attempt, Nagios will execute a parallel check of all
> hosts that are being depended upon. In Nagios 3.x, host checks are
> no longer performed immediately after each other, but at a
> retry_interval, just as services are re-checked. That means that
> theoretically all hosts that are being depended upon will have been
> checked before the dependency logic is tested and a decision to
> notify is made.
Having a retry_interval and parallel host checks sounds very, very nice!
I'm looking forward to testing the new code.
Thanks a lot, Holger
--
PGP fingerprint: F1F0 9071 8084 A426 DD59 9839 59D3 F3A1 B8B5 D3DE
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
More information about the Developers
mailing list