Escalation Problems
Joseph B. McQueen
jmcqueen at wpsc.com
Thu Jul 31 22:21:13 CEST 2003
I've been trying to solve a problem with escalations for several days
now. The issue is that I've defined a "host" escalation for a device.
When the host goes down, it sends the standard notification. However,
when the "escalation" should occur, I get no notices to any of the
contacts(contact groups) I have defined. Conversely, when I remove the
escalation, I still only get one standard notification, with no further
notifications after the "notification_inteval".
I recompiled the nagios executable with the "--enable-DEBUG4" to see
what may be happening. The console debug messages are shown below:
Nagios 1.1
Copyright (c) 1999-2003 Ethan Galstad (nagios at nagios.org)
Last Modified: 06-02-2003
License: GPL
Nagios 1.1 starting... (PID=23709)
Warning: Contact 'nagios' is not a member of any contact groups!
Warning: Contact group 'test-escalate' is not used in any
hostgroup/service definitions or host/hostgroup/service escalations!
HOST NOTIFICATION ATTEMPT: Host 'testdevice'
Current time: Thu Jul 31 15:55:09 2003
HOST STATE CHANGE!
Current notification number: 1
Current Time: Thu Jul 31 15:55:09 2003
Next acceptable notification time: Thu Jul 31 15:56:09 2003
Notify user jmcqueen
Raw Command: /usr/bin/printf "%b" "Host '$HOSTALIAS$' is
$HOSTSTATE$\nInfo: $OUTPUT$\nTime: $DATETIME$" | /bin/mail -s
"$NOTIFICATIONTYPE$ alert - Host $HOSTNAME$ is $HOSTSTATE$" $CONTACTPAGER$
Processed Command: /usr/bin/printf "%b" "Host 'Joe McQueens Test
Device' is DOWN\nInfo: /bin/ping -n -U -c 1 192.168.1.1\nTime: Thu Jul
31 15:55:09 EDT 2003" | /bin/mail -s "PROBLEM alert - Host testdevice is
DOWN" phonenum at messaging.nextel.com
Raw Command: /usr/bin/printf "%b" "***** Nagios
*****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState:
$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $OUTPUT$\n\nDate/Time:
$DATETIME$\n" | /bin/mail -s "Host $HOSTSTATE$ alert for $HOSTNAME$!"
$CONTACTEMAIL$
Processed Command: /usr/bin/printf "%b" "***** Nagios
*****\n\nNotification Type: PROBLEM\nHost: testdevice\nState:
DOWN\nAddress: 192.168.1.1\nInfo: /bin/ping -n -U -c 1
162.88.43.108\n\nDate/Time: Thu Jul 31 15:55:09 EDT 2003\n" | /bin/mail
-s "Host DOWN alert for testdevice!" jmcqueen at wpsc.com
APPROPRIATE CONTACTS HAVE BEEN NOTIFIED
After the "Next acceptable notification time" expires, I receive no
futher notifications from the device. My applicable configs are shown
below. My intervals are set small for testing, but was receiving the
same behavior with the intervals set higher. I even tried using a
"hostgroupescalation" and "serviceescalation" in attempts to get it
working with no success.
***hosts.cfg***
define host {
name generic-host ; The name of this host
template - referenced in other host definitions, used for template
recursion/$
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information
across program restarts
retain_nonstatus_information 1 ; Retain non-status information
across program restarts
register 0 ; DONT REGISTER THIS DEFINITION -
ITS NOT A REAL HOST, JUST A TEMPLATE!
}
define host {
host_name testdevice
alias Joe McQueens Test Device
address 192.168.1.1
check_command check-host-alive
max_check_attempts 2
notification_interval 1
notification_period 24x7
notification_options d,u,r
}
***hostgroups.cfg***
# Test Group
define hostgroup {
hostgroup_name test-group
alias Test HostGroup
contact_groups test-admins
members testdevice
}
***services.cfg***
define service {
name generic-service ; The 'name' of this
service template, referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are
enabled/accepted
parallelize_check 1 ; Active service checks should
be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this
service (if necessary)
check_freshness 0 ; Default is to NOT check
service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information
across program restarts
retain_nonstatus_information 1 ; Retain non-status information
across program restarts
register 0 ; DONT REGISTER THIS DEFINITION
- ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
#Test Group Devices
define service {
use generic-service ; Name of
service template to use
host_name testdevice
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 2
normal_check_interval 1
retry_check_interval 1
contact_groups test-admins
notification_interval 5
notification_period 24x7
notification_options u,c,r
check_command check_ping!2000.0,100%!2000.0,100%
}
***contacts.cfg***
define contact {
contact_name jmcqueen
alias Joe McQueen
host_notification_period 24x7
host_notification_options d,u,r
host_notification_commands host-notify-by-email,host-notify-by-epager
service_notification_period 24x7
service_notification_options u,c,r
service_notification_commands notify-by-email,notify-by-epager
email jmcqueen at wpsc.com
pager anumber at messaging.nextel.com
}
#Joe McQueen Office Phone (For Testing)
define contact {
contact_name jmcqueen-office
alias Joseph,McQueen
register 1
host_notification_period 24x7
host_notification_options d,u,r
host_notification_commands host-notify-by-voice1,host-notify-by-voice2
service_notification_period 24x7
service_notification_options w,u,c,r
service_notification_commands notify-by-voice1,notify-by-voice2
email 2709
pager 2709
}
***contactgroups.cfg***
#Test Contact Group
define contactgroup {
contactgroup_name test-admins
alias Test Admins
members jmcqueen
}
#Test Escalations Group
define contactgroup {
contactgroup_name test-escalate
alias Test Escalations
members jmcqueen-office
}
***escalations***
define hostescalation{
host_name testdevice
contact_groups test-admins,test-escalate
first_notification 2
last_notification 5
notification_interval 1
I appreciate any help anyone might provide.
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list