Overlapping escalations not working as documented?
Gwyn Connor
gwyn.connor at googlemail.com
Thu Aug 6 16:35:10 CEST 2009
Hi,
I am trying to have Nagios 3.1.2 alert me every morning at 8 am of all current
service failures by sms.
Services are currently checked 24/7, but notifications are only sent during
work hours (08:00-20:00) and only every 3 hours. Now if a service goes down
shortly before the notification_period starts, it takes 3 hours until the next
notification is sent, which is too long.
I have tried using escalations to get notified at 8 am, but it is not working:
# 'workhours' timeperiod definition
define timeperiod{
timeperiod_name workhours
alias "Normal" Working Hours
monday 08:00-20:00
tuesday 08:00-20:00
wednesday 08:00-20:00
thursday 08:00-20:00
friday 08:00-20:00
}
# 'morningchecktime' timeperiod definition
define timeperiod{
timeperiod_name morningchecktime
alias Morning Check Time
monday 07:49-08:00
tuesday 07:49-08:00
wednesday 07:49-08:00
thursday 07:49-08:00
friday 07:49-08:00
}
define contact{
contact_name c-sms-morning
alias Morning alert via SMS
service_notification_period morningchecktime
host_notification_period morningchecktime
service_notification_options c,r
host_notification_options d,r
service_notification_commands notify-service-by-sms
host_notification_commands notify-host-by-sms
email <email-address>
}
define contactgroup{
contactgroup_name sms-morning
alias morning SMS
members c-sms-morning
}
define service{
name test-service
use service
check_period 24x7
max_check_attempts 6
normal_check_interval 5
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 180
notification_period 24x7
register 0
}
# Test
define service{
use test-service
host_name test
service_description Disk /
check_command check_snmp_disk!/!10!20
}
define serviceescalation{
host_name test
service_description Disk /
contact_groups admins
first_notification 1
last_notification 0
notification_interval 180
escalation_period 24x7
escalation_options c,r
}
define serviceescalation{
host_name test
service_description Disk /
contact_groups sms-morning
first_notification 1
last_notification 0
notification_interval 5
escalation_period morningchecktime
escalation_options c,r
}
In the documentation it says about overlapping service escalations:
"In any case where there are multiple valid escalation definitions for a
particular notification, Nagios will choose the smallest notification interval."
However, in my case it seems to use the biggest interval. Example:
1. The service goes CRITICAL into HARD state at 6:00 am.
2. The admins are not notified, because it is not yet workhours.
3. Time passes until 07:49.
4. Since the service is checked every 5 minutes, it will also be checked
at least once within the morningchecktime escalation period (07:49-08:00).
The sms-morning contact group should be notified now (its
notification_period is also morningchecktime). But it isn't notified.
When I changed the morningchecktime period to cover more time (07:49-11:00),
then at 9:00 am - exactly 180 minutes after failure - notifications are sent
both to admins AND sms-morning. It looks like Nagios is using the bigger
notification_interval of both overlapping escalations.
Any ideas how I can fix it to make it work? Maybe I still have an error in my
config file that I overlooked?
Gwyn
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list