Parent/child behaviour, WAS:Re: check_ping vs. check_icmp?
Greg Vickers
g.vickers at qut.edu.au
Mon Oct 17 01:58:53 CEST 2005
Andreas,
Andreas Ericsson wrote:
> Andrew Laden wrote:
>
>> How does using check_icmp compare to using check_fping?
>>
>> It seems that check_fping will return a down answer much faster. Since
>> host checks are most often run when the host is down, that seems to be the
>> performance that we are concerned with.
>
> This might seem to be the case, but it actually isn't. A hostcheck is
> run each time a service changes from whatever to any non-OK state. In a
> (somewhat) healthy network hostchecks are being run when the host is up
> more often than when they're down. The opposite is of course true if
> there are hosts being down for a long time or if a whole segment of the
> network goes to lunch,
I thought that if parents were set up correctly that Nagios would not
run any service or host checks on hosts that are children of the
blocking outage? So there would be a delay while Nagios figures out
which is the parent host that is down (i.e. the service checks failing
'up' the parent dependencies and the subsequent delays on the host
checks until the 'top' parent host is checked) but once the top-most
parent is host checked, no host or service checks will be run on the
children until that parent becomes good. Subsequently you would only see
a delay in check scheduling/processing when the host check is run on
that 'top' parent host.
Is this the expected and correct behavior or is it too early on Monday
morning for me?
<snipity-snip-snip>
Ah-ha - RTFM prior to inserting foot in mouth. The networkoutages.html
states:
"If all of the immediate child hosts of one of these flagged hosts is
DOWN or UNREACHABLE and has no immediate parent host that is up, the
flagged host is the cause of a network outage. If even one of the
immediate children of a flagged host does not pass this test, then the
flagged host is not the cause of a network outage."
So from this statement, I understand that all children will be host
checked to determine fully which host is the cause of a network outage,
and that could cause a large delay if there are a lot of hosts to check.
However I don't understand the statement "... has no immediate parent
host that is up..." Shouldn't that read "... has a parent host up..."
otherwise how would Nagios reach that blocking host to test it???
It really could be too early...
Thanks,
--
Greg Vickers
Project Manager, IT Security
Information Technology Services
Queensland University of Technology
L12, 126 Margaret St, Brisbane
Phone: (07) 3864 9536
Email: g.vickers at qut.edu.au
IT Security web site: http://www.its.qut.edu.au/itsecurity/
CRICOS No. 00213J
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list