Service Escalation Timing Issue
Jeff Tillotson
jtillotson at techtarget.com
Tue Jun 22 13:30:08 CEST 2010
On Tue, Jun 22, 2010 at 05:53:45AM -0400, Assaf Flatto wrote:
>Tillotson, Jeff wrote:
>> I've got a service that I've set up with the following requirements. E-mail a certain group after service has been down for 5 minutes. page when service has been down for 10 minutes. Then, page again after 30 minutes. I'm fairly certain my problem is with notification_interval in the service_escalation and that I'm misunderstanding this from the documentation:
>> "When defining notification escalations, it is important to keep in mind that any contact groups that were members of "lower" escalations (i.e. those with lower notification number ranges) should also be included in "higher" escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated."
>>
>>
>> Following are the configuration options (I've snipped some options down):
>>
>> Nagios.cfg:
>> interval_length=1 (One second)
>>
>> Template:
>>
>> define service{
>> name distrib-nevent-graph
>> check_period 24x7
>> max_check_attempts 2
>> contact_groups no-one
>> notification_options w,u,c,r
>> notification_interval 60
>> notification_period 24x7
>> register 0
>> }
>>
>> Service:
>> define service{
>> use distrib-nevent-graph
>> hostgroup_name location-v7apache
>> service_description v7apache-check
>> }
>>
>> Service Escalation:
>> define serviceescalation {
>> hostgroup_name location-v7apache
>> service_description v7apache-check
>> first_notification 5
>> last_notification 0
>> notification_interval 1800
>> contact_groups nopage, core
>> }
>> define serviceescalation {
>> hostgroup_name location-v7apache
>> service_description v7apache-check
>> first_notification 10
>> last_notification 0
>> notification_interval 1800
>> contact_groups page, nopage, core
>> }
>>
>>
>>
>If i am reading this right , you have your first notification sent after
>2.5 hours .
>
>1800sec = 30 minutes x 5 ( first notification) = 2.5 hours.
>
>you might want to change the interval to 300 .
>
Thanks for your response.
If I change the interval to 300, than core and nopage get the
notification every 5 minutes after the 5th notification. Then I page
won't get the first alert until 30 minutes after the host is down
(5 at 1min interval + 5 at 5min interval). What I really want is nopage
and core to get notifications after service has been down for 5 minutes and
than 30 minutes after. page to get notifications after service has been
down for 10 minutes and 30 minutes after.
I almost think the following will provide what I want but the
documentation section I posted in my original post makes me think this
is a bad idea.
define serviceescalation {
hostgroup_name location-v7apache
service_description v7apache-check
first_notification 5
last_notification 0
notification_interval 1800
contact_groups nopage, core
}
define serviceescalation {
hostgroup_name location-v7apache
service_description v7apache-check
first_notification 10
last_notification 0
notification_interval 1800
contact_groups page
}
--Jeff
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list