Nagios occasionally does not send notifications when a service goes down
Andy Barker
andrew.barker at nottingham.ac.uk
Tue Feb 22 10:42:19 CET 2005
My *guess* would be along the lines that your contacts (in contacts.cfg)
are only getting notifications sent between certain times usually
"workhours" (which is initially 9am-5pm weekdays). This is configured in
the timeperiods.cfg .
Either that or possibly the host in question is using the workhours time
period in it's definition.
Andy
On Mon, 2005-02-21 at 11:53 -0600, Toby Kraft wrote:
>
> Hi all,
>
> I've been using Nagios 1.2 (and Netsaint before) with some clients for
> a while. One installation (on Fedora Core 2) has an issue where a
> service will go down, but Nagios does not send any notification.
>
> The service check is a simple tcp port check, the host_alive_check is
> *default (ping), the host can be pinged. This host has one and only
> one service. It's a pretty vanilla install and everything works fine
> most of the time.
>
> This past weekend, a host went down. No notifications were sent.
> Monday morning the staff came in, saw the host was down and restarted
> it. After they restarted the target host, Nagios then sent out a
> bunch of Host Down alerts followed by a Host Up alert. Notifications
> for this server or host were NOT disabled (nagios.log archives show
> they were enabled on 2/9/05).
>
> Okay now you're saying - it's your mail server. But Nagios did not
> log any notifications at the time of the problem!
>
> The Host Alert History shows:
> Sun Feb 20 00:00:00 CST 2005 to Mon Feb 21 00:00:00 CST 2005
>
> [02-20-2005 18:08:43] SERVICE ALERT: ucisvr5.champlabs.com;Sandbox -
> DB;CRITICAL;HARD;1;Connection refused or timed out
> [02-20-2005 18:08:43] HOST ALERT: ucisvr5.champlabs.com;DOWN;
> HARD;3;/bin/ping -n -U -c 1 ucisvr5.champlabs.com
> [02-20-2005 18:08:40] HOST ALERT: ucisvr5.champlabs.com;DOWN;
> SOFT;2;/bin/ping -n -U -c 1 ucisvr5.champlabs.com
> [02-20-2005 18:08:37] HOST ALERT: ucisvr5.champlabs.com;DOWN;
> SOFT;1;/bin/ping -n -U -c 1 ucisvr5.champlabs.com
>
> The Host Notification History shows:
> Sun Feb 20 00:00:00 CST 2005 to Mon Feb 21 00:00:00 CST 2005
> No notifications have been recorded for this host in this archived log
> file
>
> The Service Alert History shows:
> Sun Feb 20 00:00:00 CST 2005 to Mon Feb 21 00:00:00 CST 2005
> [02-20-2005 18:08:43] SERVICE ALERT: ucisvr5.champlabs.com;Sandbox -
> DB;CRITICAL;HARD;1;Connection refused or timed out
>
> The Service Notification History shows:
> Sun Feb 20 00:00:00 CST 2005 to Mon Feb 21 00:00:00 CST 2005
> No notifications have been recorded for this service in this archived
> log file
>
> It seems that this occurs after Nagios has been up and running for a
> while. The system and Nagsio have been up for 11 days which doesn't
> seem like a long time.
>
> Mainly just fishing for any ideas on what could cause this or how to
> troubleshoot the problem. It would be nice if Nagios logged some info
> when it processes an event and then decides NOT to send a
> notification, like "Notification for event xxxx suppressed because
> yyyyy" or some such.
>
> Thanks for listening. I'll check into any debug and/or logging
> options.
>
> Toby
>
This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list