Service Escalation Timing Issue
Assaf Flatto
nagios at flatto.net
Tue Jun 22 11:53:45 CEST 2010
Tillotson, Jeff wrote:
> I've got a service that I've set up with the following requirements. E-mail a certain group after service has been down for 5 minutes. page when service has been down for 10 minutes. Then, page again after 30 minutes. I'm fairly certain my problem is with notification_interval in the service_escalation and that I'm misunderstanding this from the documentation:
> "When defining notification escalations, it is important to keep in mind that any contact groups that were members of "lower" escalations (i.e. those with lower notification number ranges) should also be included in "higher" escalation definitions. This should be done to ensure that anyone who gets notified of a problem continues to get notified as the problem is escalated."
>
>
> Following are the configuration options (I've snipped some options down):
>
> Nagios.cfg:
> interval_length=1 (One second)
>
> Template:
>
> define service{
> name distrib-nevent-graph
> check_period 24x7
> max_check_attempts 2
> contact_groups no-one
> notification_options w,u,c,r
> notification_interval 60
> notification_period 24x7
> register 0
> }
>
> Service:
> define service{
> use distrib-nevent-graph
> hostgroup_name location-v7apache
> service_description v7apache-check
> }
>
> Service Escalation:
> define serviceescalation {
> hostgroup_name location-v7apache
> service_description v7apache-check
> first_notification 5
> last_notification 0
> notification_interval 1800
> contact_groups nopage, core
> }
> define serviceescalation {
> hostgroup_name location-v7apache
> service_description v7apache-check
> first_notification 10
> last_notification 0
> notification_interval 1800
> contact_groups page, nopage, core
> }
>
>
>
If i am reading this right , you have your first notification sent after
2.5 hours .
1800sec = 30 minutes x 5 ( first notification) = 2.5 hours.
you might want to change the interval to 300 .
*first_notification*: This directive is a number that identifies the
/first/ notification for which this escalation is effective. For
instance, if you set this value to 3, this escalation will only be used
if the service is in a non-OK state long enough for a third notification
to go out.
*notification_interval*: This directive is used to determine the
interval at which notifications should be made while this escalation is
valid. If you specify a value of 0 for the interval, Nagios will send
the first notification when this escalation definition is valid, but
will then prevent any more problem notifications from being sent out for
the host. Notifications are sent out again until the host recovers. This
is useful if you want to stop having notifications sent out after a
certain amount of time. Note: If multiple escalation entries for a host
overlap for one or more notification ranges, the smallest notification
interval from all escalation entries is used.
--
Never,Ever Cut A Deal With a Dragon
I am doing a Charity Bike ride On the 27 of June for the
Capital to Coast Charity. Please help by Donating
http://www.justgiving.com/Lovefilm-capital-to-coast
------------------------------------------------------------------------------
ThinkGeek and WIRED's GeekDad team up for the Ultimate
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
lucky parental unit. See the prize list and enter to win:
http://p.sf.net/sfu/thinkgeek-promo
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list