parent/child setup not working (Resolved)
David Miller
nagios at d.sparks.net
Tue Jan 9 16:09:11 CET 2007
Hendrik Bäcker wrote:
I finally got this figured out, and thought I'd send a summary for the
archives. Hopefully it will help others with the search engines and
archives.
Keywords; nagios debian parent child network outage unwanted notifications
> Your wanted thing is called "Network Outage". This is done by setting
> parents and grandparents in the kind of the way that your nagios can
> reach each host. I think, you are getting this right but:
>
> Nagios 2 only checks services, let us say, every 5 minutes. If only
> one of 'x' services on a host returns with a non-OK state nagios will
> try to check if the host is reachable via the host-check-command, if
> given. If you don't set a host-check-command, nagios will never try
> this and you are getting a service alert.
>
>
> Network Outage detection relies on host checks.
> So: no host checks, no outage detection.
>
>> and
>>
>> service checks are performed as long as the host is known or presumed to
>> be up.
>>
> Service checks are performed as long as your nagios process is running
> and the service check timeperiod is active.
> AFAIK nagios tries to check a service even if it knows that the host
> is down, the only difference between host is up or down is, that you
> will receive x service alerts if x services are non-OK or just one
> host alert.
>
> I think your logic failure is the "way" that you are thinking how
> nagios works.
>
> Don't think on a checking way like this:
>
> Parent Host --> Host --> Service
>
> It is more like this:
>
> Service --> Host --> Parent
>
> Nagios intelligence is that it suppress notifications not the service
> checks.
>
This is pretty much the key. Perhaps the documentation could be
clearer; it's possible to read the documentation the right way, but a
lot of people thought my configs looked right, so it's easy to read it
the wrong way as well.
The way I thought it would work is that one could define service checks
for host "web", and define a parent for web of "pix", and that if host
pix was down that services for web would stop. Or at the very least not
be reported on, since web certainly couldn't be functional if its
parents were down.
However, that's not the way it works. Nagios2.x tries hard not to do
host checks; it only performs host checks if service checks fail. It
also doesn't walk up the tree to see if a parent host check fails a
service check fails. Intuitively, that's the behavior I expected.
The solution requires a host check be specified for every host with a
service check. If the service check fails nagios will perform said host
check, determine the host is unreachable. If a parent host (pix) is
defined for the unreachable host (web) notifications will be
surpressed. So if you don't want notifications about hosts beyond a
gateway you have to define a host check on that host, not just the
gateway parent.
As an aside, the documentation alludes to performance issues from host
check, most of which are based on pings. Is this due to the nastiness
of handling icmp packets on their return? IE, when an icmp packet is
received the kernel hands a copy of it to all processes listening for a
reply; this gets ugly quickly if too many processes are pinging at
once. If that's the case I have some code I'd be happy to donate that
solves that particular problem.
Thanks to all the list members who helped!
--- David
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list