Parent/child behaviour, WAS:Re: check_ping vs. check_icmp?

Andreas Ericsson ae at op5.se
Mon Oct 17 09:47:15 CEST 2005


Greg Vickers wrote:
> Andreas,
> 
> Andreas Ericsson wrote:
> 
>> Andrew Laden wrote:
>>
>>> How does using check_icmp compare to using check_fping?
>>>
>>> It seems that check_fping will return a down answer much faster. 
>>> Since host checks are most often run when the host is down, that 
>>> seems to be the
>>> performance that we are concerned with.
>>
>>
>> This might seem to be the case, but it actually isn't. A hostcheck is 
>> run each time a service changes from whatever to any non-OK state. In 
>> a (somewhat) healthy network hostchecks are being run when the host is 
>> up more often than when they're down. The opposite is of course true 
>> if there are hosts being down for a long time or if a whole segment of 
>> the network goes to lunch,
> 
> 
> I thought that if parents were set up correctly that Nagios would not 
> run any service or host checks on hosts that are children of the 
> blocking outage? So there would be a delay while Nagios figures out 
> which is the parent host that is down (i.e. the service checks failing 
> 'up' the parent dependencies and the subsequent delays on the host 
> checks until the 'top' parent host is checked) but once the top-most 
> parent is host checked, no host or service checks will be run on the 
> children until that parent becomes good. Subsequently you would only see 
> a delay in check scheduling/processing when the host check is run on 
> that 'top' parent host.
> 
> Is this the expected and correct behavior or is it too early on Monday 
> morning for me?
> 
> <snipity-snip-snip>
> 
> Ah-ha - RTFM prior to inserting foot in mouth. The networkoutages.html 
> states:
> 
> "If all of the immediate child hosts of one of these flagged hosts is 
> DOWN or UNREACHABLE and has no immediate parent host that is up, the 
> flagged host is the cause of a network outage. If even one of the 
> immediate children of a flagged host does not pass this test, then the 
> flagged host is not the cause of a network outage."
> 
> So from this statement, I understand that all children will be host 
> checked to determine fully which host is the cause of a network outage, 
> and that could cause a large delay if there are a lot of hosts to check.
> However I don't understand the statement "... has no immediate parent 
> host that is up..." Shouldn't that read "... has a parent host up..." 
> otherwise how would Nagios reach that blocking host to test it???
> 


It probably should read "has a parent host up". Whichever way you look 
at it, lots of hostchecks are going to be run when a large number of 
hosts are anything else than OK, but most of the time hostchecks are run 
against hosts that are up.


> It really could be too early...
> 

It always is. :)

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list