Query regarding Nagios notification

Satish Kumar P satishkumarp2k1 at gmail.com
Wed Oct 14 07:43:08 CEST 2009

Previous message: Notify only when both services are down.
Next message: Query regarding Nagios notification
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

We have a Nagios server that monitors around 300 production servers
and around 2000+ services on all these servers. Recently, when the
STATE of one of the services on a particular host turned HARD, but
Nagios didn't NOTIFY. So I am just trying to understand why it didn't
notify. Here's more information regarding the configuration:

define service {
        service_description     MAILQ_1K_2K
        host_name               server-name
        use                     generic-service
        check_command           check_mailq_snmp!1000!2000
        contact_groups          cg_server-name
}

define contactgroup {
        contactgroup_name       cg_server-name
        alias                   server-name Contact Group
        members                 team_emailpage-24x7
}

define contact {
        contact_name                  team_emailpage-24x7
        alias                         team_emailpage-24x7
        service_notification_period   24x7
        host_notification_period      24x7
        service_notification_options  c,r
        host_notification_options     d,r
        service_notification_commands notify-by-page,notify-by-email
        host_notification_commands    host-notify-by-page,host-notify-by-email
        email                         email-address
        pager                         team
}


Following are the few relevant options defined under "generic-service":

    check_period                    24x7
    normal_check_interval           5
    retry_check_interval            2
    max_check_attempts              5
    notification_period             24x7

And following are the corresponding logs when the service went down:

Oct 11 02:19:50 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;WARNING;SOFT;1;mailq is 1358
Oct 11 02:22:49 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;WARNING;SOFT;2;mailq is 1537
Oct 11 02:26:05 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;WARNING;SOFT;3;mailq is 1799
Oct 11 02:28:59 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;WARNING;SOFT;4;mailq is 1799
Oct 11 02:36:53 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;CRITICAL;HARD;5;mailq is 2133

I modified the server names. The WARNING THRESHOLD is 1000 and
CRITICAL THRESHOLD is 2000. After roughly 45 minutes later, the
service recovered, but Nagios didn't fire any alert w.r.t this service
during this whole period (i mean until it came back to OK state).
Nagios logs when this service came back:

Oct 11 03:20:20 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;CRITICAL;SOFT;1;mailq is 2968
Oct 11 03:22:17 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;CRITICAL;SOFT;2;mailq is 2968
Oct 11 03:24:17 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;CRITICAL;SOFT;3;mailq is 2968
Oct 11 03:26:18 nagios-server nagios: SERVICE ALERT:
server-name;MAILQ_1K_2K;OK;SOFT;4;mailq is 411

More info: Looking at Nagios documentation, I understand that Nagios
does "on demand host checks" when a service changes STATE. So I
guessed, Nagios might have performed HOST CHECK when it actually
turned HARD (and simultaneously from WARNING to CRITICAL). And I see
lot of logs related to other services after this SERVICE turned HARD,
but I wonder there should have been NOTIFICATION w.r.t this particular
service. Thoughts??

Nagios version: 3.1.2
O.S: Debian 4.0 (Etch)

Thanks in advance.

Thanks,
Satish

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Previous message: Notify only when both services are down.
Next message: Query regarding Nagios notification
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Users mailing list