Hi guys!<br><br>I am now officially baffled on how Nagios handles service escalations and notifications. I'm using Nagios 3.2.3 on SLES 10 SP3 and my current setup is this:<br><br>service_escalation.cfg:<br><br>define serviceescalation {<br>
service_description http_80<br> host_name apache02<br> first_notification 1<br> last_notification 5<br> notification_interval 60<br> escalation_period Office_Hours<br>
contact_groups unix-sms, dba-email, dev-email<br>}<br><br>define serviceescalation {<br> service_description http_80<br> host_name apache02<br> first_notification 6<br>
last_notification 8<br> notification_interval 90<br> escalation_period Office_Hours<br> contact_groups unix-sms, dba-email, dev-email, unix-supervisor, dev-supervisor<br>}<br>
<br>define serviceescalation {<br> service_description http_80<br> host_name apache02<br> first_notification 1<br> last_notification 0<br> notification_interval 60<br>
escalation_period 24x7<br> contact_groups unix-admins-email<br>}<br><br>The users defined in the service_escalation.cfg have their contacts.cfg configured like this:<br><br>define contact{<br> contact_name unix-sms<br>
alias Team UNIX<br> host_notification_period Early_Morning<br> service_notification_period Early_Morning<br> host_notification_options u,d,r<br>
service_notification_options w,c,u,r<br> host_notification_commands host-notify-by-epager<br> service_notification_commands notify-by-epager<br> email <a href="mailto:unix@email.org">unix@email.org</a><br>
}<br><br>define contact{<br>
contact_name unix-supervisor<br>
alias Team UNIX Supervisor<br>
host_notification_period Early_Morning<br>
service_notification_period Early_Morning<br>
host_notification_options u,d,r<br>
service_notification_options w,c,u,r<br>
host_notification_commands host-notify-by-epager<br>
service_notification_commands notify-by-epager<br>
email <a href="mailto:unixsupervisor@email.org">unixsupervisor@email.org</a><br>
}<br><br>timeperiod.cfg looks like this:<br><br>define timeperiod{<br> timeperiod_name Office_Hours<br> alias Office_Hours<br> sunday 09:00-20:00<br> monday 09:00-20:00<br>
tuesday 09:00-20:00<br> wednesday 09:00-20:00<br> thursday 09:00-20:00<br> friday 09:00-20:00<br> saturday 09:00-20:00<br>
}<br><br>define timeperiod{<br> timeperiod_name Early_Morning<br> alias Early_Morning<br> sunday 07:00-22:10<br> monday 07:00-22:10<br>
tuesday 07:00-22:10<br> wednesday 07:00-22:10<br> thursday 07:00-22:10<br> friday 07:00-22:10<br> saturday 07:00-22:10<br>
}<br><br>With these configurations in place, http_80 service goes down at 10pm every night (scheduled downtime). I am expecting that notifications starting from 10pm onwards will go *only* to unix-admins-email because of the service_escalation.cfg file. And it happily did, at least for the critical notifications.<br>
<br>Now the fun part comes in. The recovery notification was sent to the unix-sms, dba-email, dev-email, unix-supervisor, dev-supervisor groups at 7:03am, when it returned to OK status, which is weird because the critical notifications from 10pm to 6am (next day) was sent only and only to the unix-admins-email group.<br>
<br>Plus, I read from the Nagios docs that it will not send recovery notifications to those who did not receive the critical/warning/unknown notifications in the first place.<br><br>So my questions are:<br>Why did Nagios send the recovery alert to the supervisors, who did not know that the service was down in the first place because they did not receive the critical alert? <br>
Did Nagios took their defined timeperiods into consideration when it send the recovery alert?<br><br>TIA!<br>