Notifications not reaching me

Gary Lawrence Murphy garym at canada.com
Mon Dec 6 21:00:43 CET 2004
Previous message: Monitoring ports & BGP on Cisco
Next message: Notifications not reaching me
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm having some trouble understanding the notifications system; it's
probably just a misunderstanding about terminology, but the two sections
on notifications and notification escalations haven't offered any clues
as to why the following service does not escalate:

First, here's the service, and I know it works because I can trigger it
from a command line and it shows up correctly in the nagios web display

define service {
        use                             net-service
        host_name                       f1
        service_description             Log Output
        contact_groups                  audit-trail
        register                        0
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_period             24x7
        normal_check_interval           2
        notification_options            w,u,c,r
        check_command                   check_nrpe!check_log
}

"check_log" is a simple plug-in that tests the mtime on the file, and
it's unimportant here because I have several other services that are
giving me similar notification problems.

here are the escalation rules:

define serviceescalation {
        host_name                       f1
        service_description             Log Output
        first_notification              6
        last_notification               9
        contact_groups                  code-blue
        notification_interval           0
}

define serviceescalation {
        host_name                       f1
        service_description             Log Output
        first_notification              10
        last_notification               19
        contact_groups                  code-yellow
        notification_interval           0
}

What I /expect/ to happen is for the normal check interval to be 5
minutes, so the first 3 checks (15 min) will be under the threshold to
trigger the first notification.  On the third failure, 15 minutes, I
expect a notice to the audit-trail, and I get one.

On the /sixth/ interval, ie at 30 minutes, I expect to get one and
only one notice sent to code-blue, and at the 10th, at 50 minutes past
the first detected failure, I expect to see another single notice out
to code-yellow.

But what I get (on the Extended display) is this:

    Current Status:   CRITICAL    
    Status Information: CRITICAL - File age 10h 48m
    Current Attempt:    3/3
    State Type: HARD
    Last Check Type:    ACTIVE
    Last Check Time:    12-06-2004 14:51:52
    Status Data Age:    0d 0h 1m 12s
    Next Scheduled Active Check:    12-06-2004 14:53:52
    Latency:    < 1 second
    Check Duration: < 1 second
    Last State Change:  12-06-2004 13:14:23
    Current State Duration: 0d 1h 38m 41s
    Last Service Notification:  12-06-2004 13:14:23
    Current Notification Number:    1
    Is This Service Flapping?   N/A
    Percent State Change:   N/A
    In Scheduled Downtime?        NO  
    Last Update:    12-06-2004 14:53:03

It's an hour and a half past the last change in state, there's been no
notices and the current notification number is still listed as "1" --
I'm assuming current notification is the number of notifications that
have been sent, which agrees with observation since there have been no
escalations.

Do I have an inappropriate use of the zero notification interval?
Is there something else gone wrong with my configuration?  I know that
notifications do work because we have some services where the alerts
are sent and no obvious difference between their definitions.

Also, I'm still attempting to duplicate the result in a test case, but
we had our notifications intervals set to 60, intending only one
message per hour to be sent while the service check interval was only
5 minutes, but again, instead of an alert after 30 minutes, we
received the first alert after more than 300 minutes.  

-- 
Gary Lawrence Murphy <garym at teledyn.com> ==============================
www.teledyn.com - blog.teledyn.com - irish.teledyn.com - sbp.teledyn.com
====================== The present moment is a powerful goddess (Goethe)


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Monitoring ports & BGP on Cisco
Next message: Notifications not reaching me
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list