"notification_interval" from the "serviceescalation" is ignored?

Ilya Ruprecht zucker4 at web.de
Thu Aug 30 14:12:42 CEST 2007


Hi admins!


Following situation: Debian 4.0 + Nagios 3.0b1.


I defined a service-template SSH:

########################################################
define service{
        name                            check-ssh-service             ; The 'name' of this service template
        check_command                   check_ssh
        service_description             SSH
        active_checks_enabled           1                       ; Active service checks are enabled
        passive_checks_enabled          1                       ; Passive service checks are enabled/accepted
        parallelize_check               1                       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1                       ; We should obsess over this service (if necessary)
        check_freshness                 0                       ; Default is to NOT check service 'freshness'
        notifications_enabled           1                       ; Service notifications are enabled
        event_handler_enabled           1                       ; Service event handler is enabled
        flap_detection_enabled          1                       ; Flap detection is enabled
        failure_prediction_enabled      1                       ; Failure prediction is enabled
        process_perf_data               1                       ; Process performance data
        retain_status_information       1                       ; Retain status information across program restarts
        retain_nonstatus_information    1                       ; Retain non-status information across program restarts
        is_volatile                     0                       ; The service is not volatile
        check_period                    24x7                    ; The service can be checked at any time of the day
        max_check_attempts              3                       ; Re-check the service up to 3 times in order to determine its final (hard) state
        normal_check_interval           10                      ; Check the service every 10 minutes under normal conditions
        retry_check_interval            1                       ; Re-check the service every two minutes until a hard state can be determined
        contact_groups                  linux-admins            ; Notifications get sent out to everyone in the 'admins' group
        notification_options            w,u,c,r                 ; Send notifications about warning, unknown, critical, and recovery events
        notification_interval           60                      ; Re-notify about service problems every hour
        notification_period             24x7                    ; Notifications can be sent out at any time
        register                        0                       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }
########################################################


Then i defined a hostgroup, for which i use the SSH service

########################################################
define hostgroup{
        hostgroup_name  vpn-server
        alias       VPN-Gateways
        members     vpn-gw1-remote,vpn-gw1-local
        }
########################################################


Now i defined a service that uses the SSH-template and
is applied to the group "vpn-server"

########################################################
define service{
        use                     check-ssh-service
        notes                   SSH auf Linux-Servern
        hostgroup_name          vpn-server
        service_description     SSH
        }
########################################################


And at last i defined two service-escalations for SSH

(i've set the intervals so short only for testing purposes)
########################################################
define serviceescalation{
        hostgroup_name          vpn-server
        service_description     SSH
        first_notification      1
        last_notification       5
        notification_interval   3
        contact_groups          linux-admins
        }

define serviceescalation{
        hostgroup_name          vpn-server
        service_description     SSH
        first_notification      5
        last_notification       0
        notification_interval   10
        contact_groups          linux-admins
        }
########################################################

"interval_length" is set to 60 seconds in nagios.cfg.

So far, so good.



1. PROBLEM:
===========

Now, i get following notifications (these here are the syslog-entries):

########################################################
Aug 30 13:05:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:06:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:07:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:07:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:17:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:27:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:37:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds 
Aug 30 13:47:39 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:57:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 14:07:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
########################################################


The notifications here are sent when HARD state is reached, right?
So the first notification is the the one at 13:07:38. Ok. 

But according to my config, the notifications 1 to 5 should be resent every 3 minutes.
Then why the second and other notifications came 10 minutes after each other?
I see only one value, where 10 minutes are set - its the "normal_check_interval".

(15 minutes later: ok, ok, i see, for the 5+ notifications the value of 10 minutes is also set....
but it should not affect the notifications period for the first 4 messages. Theoretically... :-\ )


So as i understand the problem:

"notification_interval" from the "serviceescalation" is ignored!

What could help out?


P.S.: checked the config of service-escalations via webgui and got the the "Notification Interval" for my two escalations is set to "0"!
HOW THAT!?!?



2. PROBLEM:
===========

Furthermore, the "notification_interval" in the service-part is described as "Re-notify about service problems every XXX".
Note: "about service problems".
Now, if i set the notification_interval to a lower value then a "normal_check_interval", i.e. "9", i get following warning-message
at nagios pre-flight-check:

########################################################
Warning: Service 'SSH' on host 'vpn-gw1-local'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the 
effective notification interval will be that of the check interval.
Warning: Service 'SSH' on host 'vpn-gw1-remote'  has a notification interval less than its check interval!  Notifications are only re-sent after checks are made, so the effective 
notification interval will be that of the check interval.
########################################################

But, hell, what the "normal_check_interval" have to do with "notification_interval"?!
These two are completely different things! Or have i misunderstood something?



People - HELP!!

Thanks.

Ilya.





-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list