Notifications not reaching me
Gary Lawrence Murphy
garym at canada.com
Mon Dec 6 21:00:43 CET 2004
I'm having some trouble understanding the notifications system; it's
probably just a misunderstanding about terminology, but the two sections
on notifications and notification escalations haven't offered any clues
as to why the following service does not escalate:
First, here's the service, and I know it works because I can trigger it
from a command line and it shows up correctly in the nagios web display
define service {
use net-service
host_name f1
service_description Log Output
contact_groups audit-trail
register 0
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
notification_period 24x7
normal_check_interval 2
notification_options w,u,c,r
check_command check_nrpe!check_log
}
"check_log" is a simple plug-in that tests the mtime on the file, and
it's unimportant here because I have several other services that are
giving me similar notification problems.
here are the escalation rules:
define serviceescalation {
host_name f1
service_description Log Output
first_notification 6
last_notification 9
contact_groups code-blue
notification_interval 0
}
define serviceescalation {
host_name f1
service_description Log Output
first_notification 10
last_notification 19
contact_groups code-yellow
notification_interval 0
}
What I /expect/ to happen is for the normal check interval to be 5
minutes, so the first 3 checks (15 min) will be under the threshold to
trigger the first notification. On the third failure, 15 minutes, I
expect a notice to the audit-trail, and I get one.
On the /sixth/ interval, ie at 30 minutes, I expect to get one and
only one notice sent to code-blue, and at the 10th, at 50 minutes past
the first detected failure, I expect to see another single notice out
to code-yellow.
But what I get (on the Extended display) is this:
Current Status: CRITICAL
Status Information: CRITICAL - File age 10h 48m
Current Attempt: 3/3
State Type: HARD
Last Check Type: ACTIVE
Last Check Time: 12-06-2004 14:51:52
Status Data Age: 0d 0h 1m 12s
Next Scheduled Active Check: 12-06-2004 14:53:52
Latency: < 1 second
Check Duration: < 1 second
Last State Change: 12-06-2004 13:14:23
Current State Duration: 0d 1h 38m 41s
Last Service Notification: 12-06-2004 13:14:23
Current Notification Number: 1
Is This Service Flapping? N/A
Percent State Change: N/A
In Scheduled Downtime? NO
Last Update: 12-06-2004 14:53:03
It's an hour and a half past the last change in state, there's been no
notices and the current notification number is still listed as "1" --
I'm assuming current notification is the number of notifications that
have been sent, which agrees with observation since there have been no
escalations.
Do I have an inappropriate use of the zero notification interval?
Is there something else gone wrong with my configuration? I know that
notifications do work because we have some services where the alerts
are sent and no obvious difference between their definitions.
Also, I'm still attempting to duplicate the result in a test case, but
we had our notifications intervals set to 60, intending only one
message per hour to be sent while the service check interval was only
5 minutes, but again, instead of an alert after 30 minutes, we
received the first alert after more than 300 minutes.
--
Gary Lawrence Murphy <garym at teledyn.com> ==============================
www.teledyn.com - blog.teledyn.com - irish.teledyn.com - sbp.teledyn.com
====================== The present moment is a powerful goddess (Goethe)
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list