Notifications or host checks stopped working
Andreas Ericsson
ae at op5.se
Tue Oct 18 23:43:19 CEST 2005
Andrew Laden wrote:
> Ran a few more tests, and it seems that the notification issue with
> escalations was the issue.
>
> If you use escalations, and you configure such that you do not have any
> escalations in the 1st notification interval, nagios assumes there are no
> notifications to be sent, and never increments the Notification Number, and
> never runs through the rest of the notifications. I didn't test further, but
> I suspect if you ever have a level with no notifications, it will not
> continue.
>
> I had one user left in the 1st notification interval, and he was removed
> this morning. To workaround, I created a dummy user, with a no-op
> notification command, and put him(her?it?) in the 1st round. Host
> notifications immediately started working again)
>
> I'd consider this a design bug, I can see many uses for notification
> intervals with no notification.
>
It's more likely just a common everyday kind of bug. I don't really see
any uses for notification intervals with no notifications though, unless
you're talking about any notification but the first.
> Still have the issue with an unreachable host being marked as down, but as
> that was caused by a buggy service check reporting OK for an unreachable
> host, I am not going to spend a lot of time on that.
>
Hosts with OK services are never unreachable, insofar as Nagios is
concerned. I remember a discussion about that exact thing quite some
time ago.
>
> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se]
> Sent: Tuesday, October 18, 2005 2:30 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Notifications or host checks stopped working
>
> Andrew Laden wrote:
>
>>I just recently upgraded to 2.0b4.
>
>
> From?
>
>
>>Notifications were working ok when I
>>first upgraded.
>>
>
>
> Not from 1.x then, since the macros have changed between the versions.
>
>
>>Our company is having a DR test. So we shut down the routers
>>connecting one of our sites.
>>
>>The GUI shows mostly correct. The two routers are listed in Network
>>outages, And it seems that the hosts that are children of those
>>routers are all being marked as unreachable instead of down.
>>
>>But I am seeing some oddities. It looks like host checks are no longer
>>being scheduled at all. I have host escalations in place, and there
>>are no notifications going out on the two down routers. Current
>>Notification Number isnt increasing. They are in a Down Hard state,
>>but current attempt is stuck at a 1/5 count.
>>
>
>
> Are they behind the outage, or are they the ones causingt the outage?
>
>
>>So, questions
>>Is there a way to tell if host checks are being run?
>
>
> Yes. By the status data age on the host detail view.
>
>
>>They aren't in the
>>scheduled queue. I set one of the down routers to up using a passive
>
> check.
>
>>And it looks like even when the service for it went down, the host
>>check never ran. Though when I forced the check, it ran ok.
>>
>
>
> This is weird. I expect you've double-checked check_period for the host
> definitions?
>
>
>>I had a host that was in an unreachable state. I ran a service check
>>for that host that suceeded. The host went into a down state. But
>>again, no further host checks seem to have been run. And no
>>notifications have been sent out.
>>
>>Any ideas where I can look for problems?
>>
>
>
> You could try re-compiling Nagios with debug-output enabled (./configure
> --help to know which debug-options to enable) and then run the same scenario
> while running nagios in the foreground. This will produce quite a bit of
> output, so you'll likely want to pipe it through tee for later perusal as
> well.
>
> Please don't post the debug output to the list though. If you need help with
> viewing it you can put it on a web-page somewhere and then submit a link.
> Sourceforge is quite busy enough without hauling 5mb files to 6000
> subscribers.
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list