Notifications or host checks stopped working

Andreas Ericsson ae at op5.se
Tue Oct 18 23:43:19 CEST 2005


Andrew Laden wrote:
> Ran a few more tests, and it seems that the notification issue with
> escalations was the issue.
> 
> If you use escalations, and you configure such that you do not have any
> escalations in the 1st notification interval, nagios assumes there are no
> notifications to be sent, and never increments the Notification Number, and
> never runs through the rest of the notifications. I didn't test further, but
> I suspect if you ever have a level with no notifications, it will not
> continue. 
> 
> I had one user left in the 1st notification interval, and he was removed
> this morning. To workaround, I created a dummy user, with a no-op
> notification command, and put him(her?it?) in the 1st round. Host
> notifications immediately started working again)
> 
> I'd consider this a design bug, I can see many uses for notification
> intervals with no notification.
> 

It's more likely just a common everyday kind of bug. I don't really see 
any uses for notification intervals with no notifications though, unless 
you're talking about any notification but the first.

> Still have the issue with an unreachable host being marked as down, but as
> that was caused by a buggy service check reporting OK for an unreachable
> host, I am not going to spend a lot of time on that.
> 

Hosts with OK services are never unreachable, insofar as Nagios is 
concerned. I remember a discussion about that exact thing quite some 
time ago.

> 
> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se] 
> Sent: Tuesday, October 18, 2005 2:30 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Notifications or host checks stopped working
> 
> Andrew Laden wrote:
> 
>>I just recently upgraded to 2.0b4.
> 
> 
> From?
> 
> 
>>Notifications were working ok when I
>>first upgraded. 
>>
> 
> 
> Not from 1.x then, since the macros have changed between the versions.
> 
> 
>>Our company is having a DR test. So we shut down the routers 
>>connecting one of our sites.
>>
>>The GUI shows mostly correct. The two routers are listed in Network 
>>outages, And it seems that the hosts that are children of those 
>>routers are all being marked as unreachable instead of down.
>>
>>But I am seeing some oddities. It looks like host checks are no longer 
>>being scheduled at all. I have host escalations in place, and there 
>>are no notifications going out on the two down routers. Current 
>>Notification Number isnt increasing. They are in a Down Hard state, 
>>but current attempt is stuck at a 1/5 count.
>>
> 
> 
> Are they behind the outage, or are they the ones causingt the outage?
> 
> 
>>So, questions
>>Is there a way to tell if host checks are being run?
> 
> 
> Yes. By the status data age on the host detail view.
> 
> 
>>They aren't in the
>>scheduled queue. I set one of the down routers to up using a passive
> 
> check.
> 
>>And it looks like even when the service for it went down, the host 
>>check never ran. Though when I forced the check, it ran ok.
>>
> 
> 
> This is weird. I expect you've double-checked check_period for the host
> definitions?
> 
> 
>>I had a host that was in an unreachable state. I ran a service check 
>>for that host that suceeded. The host went into a down state. But 
>>again, no further host checks seem to have been run. And no 
>>notifications have been sent out.
>>
>>Any ideas where I can look for problems?
>>
> 
> 
> You could try re-compiling Nagios with debug-output enabled (./configure
> --help to know which debug-options to enable) and then run the same scenario
> while running nagios in the foreground. This will produce quite a bit of
> output, so you'll likely want to pipe it through tee for later perusal as
> well.
> 
> Please don't post the debug output to the list though. If you need help with
> viewing it you can put it on a web-page somewhere and then submit a link.
> Sourceforge is quite busy enough without hauling 5mb files to 6000
> subscribers.
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list