Children "unreachable" on soft down?
Israel Brewster
israel at frontierflying.com
Tue Mar 31 18:17:28 CEST 2009
On Mar 31, 2009, at 1:09 AM, Andreas Ericsson wrote:
> Israel Brewster wrote:
>> Does nagios (3.0.3) mark a child host as unreachable when its
>> parent enters a soft down state? I am finding myself getting
>> repeated down messages for a host (which is, in fact, down), even
>> though I have notifications set to only send a single message.
>> Looking at the logs, it would appear that what is happening is
>> that the host is flipping between "down" (which notifies me) and
>> "unreachable" (which does not). The parent host, however, never
>> enters a hard down state. Looking at the logs, what I see is that
>> one ICMP check fails, throwing the host into a soft down state,
>> but the next one works just fine, bringing it back to an up state.
>> The logic works fine for the parent host- since it never hits a
>> hard down state, it doesn't alert, and everyone is happy. But
>> apparently with the child host every time this happens, it
>> switches from critical to unreachable and back again, triggering a
>> notification. Is there any way to keep this from happening? Thanks.
>
> Doesn't flapping detection do what you want? You'd get a few
> notifications, but they'd stop after the 3rd flip or something, I
> think.
Flapping detection helps, but doesn't solve. For one thing, as you
mentioned, you still get at least a couple of notifications before it
kicks in. For another thing, this happens with a frequency of
something like once an hour or so (not consistently), so the host will
flip from down to unreachable and back again, triggering an e-mail,
perhaps do it a second time, and then it will sit in the correct
"down" state for the next 50 checks or so (thus canceling any flapping
detection) before repeating the process. It's not like I'm getting
messages every five minutes or anything, it's just that I'm getting
repeated down messages every hour or two for hosts that have been down
and haven't actually changed state.
I could, of course, schedule down time, except that I want to be
notified if/when the people in the remote station get their act
together and get the machine(s) in question back online. Also that is
only partially effective for machines that have been sent in for
repair, because I don't really know when the scheduled down time will
be over. They are down, I know they are down, I just don't want to be
told about it every few hours :-)
-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------
>
>
> --
> Andreas Ericsson andreas.ericsson at op5.se
> OP5 AB www.op5.se
> Tel: +46 8-230225 Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.
------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list