Parent/child behaviour, WAS:Re: check_ping vs. check_icmp?
Andreas Ericsson
ae at op5.se
Mon Oct 17 09:47:15 CEST 2005
Greg Vickers wrote:
> Andreas,
>
> Andreas Ericsson wrote:
>
>> Andrew Laden wrote:
>>
>>> How does using check_icmp compare to using check_fping?
>>>
>>> It seems that check_fping will return a down answer much faster.
>>> Since host checks are most often run when the host is down, that
>>> seems to be the
>>> performance that we are concerned with.
>>
>>
>> This might seem to be the case, but it actually isn't. A hostcheck is
>> run each time a service changes from whatever to any non-OK state. In
>> a (somewhat) healthy network hostchecks are being run when the host is
>> up more often than when they're down. The opposite is of course true
>> if there are hosts being down for a long time or if a whole segment of
>> the network goes to lunch,
>
>
> I thought that if parents were set up correctly that Nagios would not
> run any service or host checks on hosts that are children of the
> blocking outage? So there would be a delay while Nagios figures out
> which is the parent host that is down (i.e. the service checks failing
> 'up' the parent dependencies and the subsequent delays on the host
> checks until the 'top' parent host is checked) but once the top-most
> parent is host checked, no host or service checks will be run on the
> children until that parent becomes good. Subsequently you would only see
> a delay in check scheduling/processing when the host check is run on
> that 'top' parent host.
>
> Is this the expected and correct behavior or is it too early on Monday
> morning for me?
>
> <snipity-snip-snip>
>
> Ah-ha - RTFM prior to inserting foot in mouth. The networkoutages.html
> states:
>
> "If all of the immediate child hosts of one of these flagged hosts is
> DOWN or UNREACHABLE and has no immediate parent host that is up, the
> flagged host is the cause of a network outage. If even one of the
> immediate children of a flagged host does not pass this test, then the
> flagged host is not the cause of a network outage."
>
> So from this statement, I understand that all children will be host
> checked to determine fully which host is the cause of a network outage,
> and that could cause a large delay if there are a lot of hosts to check.
> However I don't understand the statement "... has no immediate parent
> host that is up..." Shouldn't that read "... has a parent host up..."
> otherwise how would Nagios reach that blocking host to test it???
>
It probably should read "has a parent host up". Whichever way you look
at it, lots of hostchecks are going to be run when a large number of
hosts are anything else than OK, but most of the time hostchecks are run
against hosts that are up.
> It really could be too early...
>
It always is. :)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list