"notification_interval" from the "serviceescalation" is ignored?
Ilya Ruprecht
zucker4 at web.de
Thu Aug 30 14:12:42 CEST 2007
Hi admins!
Following situation: Debian 4.0 + Nagios 3.0b1.
I defined a service-template SSH:
########################################################
define service{
name check-ssh-service ; The 'name' of this service template
check_command check_ssh
service_description SSH
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
retry_check_interval 1 ; Re-check the service every two minutes until a hard state can be determined
contact_groups linux-admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
########################################################
Then i defined a hostgroup, for which i use the SSH service
########################################################
define hostgroup{
hostgroup_name vpn-server
alias VPN-Gateways
members vpn-gw1-remote,vpn-gw1-local
}
########################################################
Now i defined a service that uses the SSH-template and
is applied to the group "vpn-server"
########################################################
define service{
use check-ssh-service
notes SSH auf Linux-Servern
hostgroup_name vpn-server
service_description SSH
}
########################################################
And at last i defined two service-escalations for SSH
(i've set the intervals so short only for testing purposes)
########################################################
define serviceescalation{
hostgroup_name vpn-server
service_description SSH
first_notification 1
last_notification 5
notification_interval 3
contact_groups linux-admins
}
define serviceescalation{
hostgroup_name vpn-server
service_description SSH
first_notification 5
last_notification 0
notification_interval 10
contact_groups linux-admins
}
########################################################
"interval_length" is set to 60 seconds in nagios.cfg.
So far, so good.
1. PROBLEM:
===========
Now, i get following notifications (these here are the syslog-entries):
########################################################
Aug 30 13:05:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:06:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:07:38 unicorn nagios: SERVICE ALERT: vpn-gw1-local;SSH;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:07:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:17:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:27:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:37:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:47:39 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 13:57:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
Aug 30 14:07:38 unicorn nagios: SERVICE NOTIFICATION: linuxadmin;vpn-gw1-local;SSH;CRITICAL;notify-service-by-email;CRITICAL - Socket timeout after 10 seconds
########################################################
The notifications here are sent when HARD state is reached, right?
So the first notification is the the one at 13:07:38. Ok.
But according to my config, the notifications 1 to 5 should be resent every 3 minutes.
Then why the second and other notifications came 10 minutes after each other?
I see only one value, where 10 minutes are set - its the "normal_check_interval".
(15 minutes later: ok, ok, i see, for the 5+ notifications the value of 10 minutes is also set....
but it should not affect the notifications period for the first 4 messages. Theoretically... :-\ )
So as i understand the problem:
"notification_interval" from the "serviceescalation" is ignored!
What could help out?
P.S.: checked the config of service-escalations via webgui and got the the "Notification Interval" for my two escalations is set to "0"!
HOW THAT!?!?
2. PROBLEM:
===========
Furthermore, the "notification_interval" in the service-part is described as "Re-notify about service problems every XXX".
Note: "about service problems".
Now, if i set the notification_interval to a lower value then a "normal_check_interval", i.e. "9", i get following warning-message
at nagios pre-flight-check:
########################################################
Warning: Service 'SSH' on host 'vpn-gw1-local' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the
effective notification interval will be that of the check interval.
Warning: Service 'SSH' on host 'vpn-gw1-remote' has a notification interval less than its check interval! Notifications are only re-sent after checks are made, so the effective
notification interval will be that of the check interval.
########################################################
But, hell, what the "normal_check_interval" have to do with "notification_interval"?!
These two are completely different things! Or have i misunderstood something?
People - HELP!!
Thanks.
Ilya.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list