'extra' notifications when dealing with parent-child relationships
Eric Young
ericryoung at yahoo.com
Tue Oct 22 16:34:27 CEST 2002
I'm posting this here first assuming that I have made
a mistake (since I'm new to nagios & all) rather than
posting it to devel, where bugs normally go (which I
think this may be)
I'm still dealing with the same 'setup' as in my other
posts (140 or so nodes, 135 with a single parent,
faking that parent going down by using ipchains rules
to block ALL ICMP). I'd love to get this working as
I'm liking nagios except in this darn 'huge network
failure' state that I'm simulating.
Here's my next question:
Is there some way to specify to a 'child' that it
should not send out 'down' up notifications if a
parent has 'recently' returned to service.
In my situation, when I deactivate my ipchains rule
such that ICMP should go through successfully, I do
get a successful up notification for the parent
router. HOWEVER, I will often receive a notification
that one of the children is 'down' pretty much
immediately followed by an 'up' notification for that
same node. My attempts to test this show that it is
likely a timing issue, in that the node for which I
receive this notification was in the midst of a 'ping'
test when I brought ICMP access back up.
I would guess (from skimming the code) that the 'ping'
that is going on is run, doesn't do well and then the
service check code checks to make sure that the parent
is up BEFORE sending notification (which I would
expect). In this small timeframe, the parent just
came up but the ping to the child failed. Therefore,
we get an extra notification that doesn't accurately
reflect the network activity it is reporting.
So, am I missing something or is there NOT a way to
say 'if my parent just returned to service 10 seconds
ago, try that check one more time'? I think it would
be useful (and not too much more code).
It would seem you could add a host or service config
item something like: parent_up_for: # of seconds. I
think you might then have to add a field to the host
structure that would track the 'last down time' (in
UNIX seconds) such that a child could check their
parent.
Any idears/flames?
__________________________________________________
Do you Yahoo!?
Y! Web Hosting - Let the expert host your web site
http://webhosting.yahoo.com/
-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future
of Java(TM) technology. Join the Java Community
Process(SM) (JCP(SM)) program now.
http://ad.doubleclick.net/clk;4699841;7576301;v?http://www.sun.com/javavote
More information about the Users
mailing list