Children "unreachable" on soft down?

Israel Brewster israel at frontierflying.com
Tue Mar 31 18:17:28 CEST 2009


On Mar 31, 2009, at 1:09 AM, Andreas Ericsson wrote:

> Israel Brewster wrote:
>> Does nagios (3.0.3) mark a child host as unreachable when its  
>> parent  enters a soft down state? I am finding myself getting  
>> repeated down  messages for a host (which is, in fact, down), even  
>> though I have  notifications set to only send a single message.  
>> Looking at the logs,  it would appear that what is happening is  
>> that the host is flipping  between "down" (which notifies me) and  
>> "unreachable" (which does not).  The parent host, however, never  
>> enters a hard down state. Looking at  the logs, what I see is that  
>> one ICMP check fails, throwing the host  into a soft down state,  
>> but the next one works just fine, bringing it  back to an up state.
>> The logic works fine for the parent host- since it never hits a  
>> hard  down state, it doesn't alert, and everyone is happy. But  
>> apparently  with the child host every time this happens, it  
>> switches from critical  to unreachable and back again, triggering a  
>> notification. Is there any  way to keep this from happening? Thanks.
>
> Doesn't flapping detection do what you want? You'd get a few
> notifications, but they'd stop after the 3rd flip or something, I  
> think.

Flapping detection helps, but doesn't solve. For one thing, as you  
mentioned, you still get at least a couple of notifications before it  
kicks in. For another thing, this happens with a frequency of  
something like once an hour or so (not consistently), so the host will  
flip from down to unreachable and back again, triggering an e-mail,  
perhaps do it a second time, and then it will sit in the correct  
"down" state for the next 50 checks or so (thus canceling any flapping  
detection) before repeating the process. It's not like I'm getting  
messages every five minutes or anything, it's just that I'm getting  
repeated down messages every hour or two for hosts that have been down  
and haven't actually changed state.

I could, of course, schedule down time, except that I want to be  
notified if/when the people in the remote station get their act  
together and get the machine(s) in question back online. Also that is  
only partially effective for machines that have been sent in for  
repair, because I don't really know when the scheduled down time will  
be over. They are down, I know they are down, I just don't want to be  
told about it every few hours :-)

-----------------------------------------------
Israel Brewster
Computer Support Technician II
Frontier Flying Service Inc.
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7250 x293
-----------------------------------------------

>
>
> -- 
> Andreas Ericsson                   andreas.ericsson at op5.se
> OP5 AB                             www.op5.se
> Tel: +46 8-230225                  Fax: +46 8-230231
>
> Considering the successes of the wars on alcohol, poverty, drugs and
> terror, I think we should give some serious thought to declaring war
> on peace.


------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list