Dumb Escalation Question
Chris Stankaitis
chris.stankaitis at datawire.net
Thu Jan 16 16:48:33 CET 2003
Hey All;
I have Nagios installed and working about 99% with lots of active and
passive checks going on and such and it's all happy except for one
thing.. I am supose to have an escalation matrix happening when a box or
service goes down and it's not escalating.. I am sure it's something
dumb on my part but with the complexity of the escalation/notify I am
having a hard time getting my head around it..
What I want is...
1) When BOX or SERVICE goes down/into changed state for it to take 5-6
mins to be in a SOFT state, to recheck during the soft state multiple
times and if it all recovers from the soft state before the interval
timeout then no one gets paged.. if the box goes into a hard state it
needs to do the following. Page Level one, give him/her 15 mins to
acknowledge the problem re-paging him/her every 5 mins, after the first
15 mins it escaltes to Level 2 and gives Level 1 and 2 another 15 mins
to acknowledge the problem again paging again every 5 mins. if after 30
mins from the start of the hard state no one acknowledges the issue then
page Level 1 Level 2 and the Manager a couple of times.
Below are examples of my current configs with just some names edited
out.. please help me if you can :)
define host{
use generic-host
host_name host1
alias A Server
address 0.0.0.0
parents gatewayrouter
check_command check-host-alive
max_check_attempts 3
notification_interval 120
notification_period 24x7
notification_options d,r
}
define service{
use generic-service
host_name host1,host2,,host3,host4,host5
service_description SSH
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups poor-oncall-guy
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_ssh
}
define hostgroupescalation{
hostgroup_name hostgroupnamehere
first_notification 2
last_notification 5
contact_groups poor-oncall-guy
notification_interval 5
}
define hostgroupescalation{
hostgroup_name hostgroupnamehere
first_notification 5
last_notification 6
contact_groups poor-oncall-guy,rest-of-unix-admins
notification_interval 15
}
define hostgroupescalation{
hostgroup_name hostgroupnamehere
first_notification 7
last_notification 10
contact_groups
poor-oncall-guy,rest-of-unix-admins,manager
notification_interval 15
}
-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
More information about the Users
mailing list