I'm pretty sure I've got everything set up correctly, as yesterday I was getting notifications sent out, and today there are none going out. <br><br>I've added some services that I knew would go critical, and started watching
nagios.log. Here is a snippet from yesterdays log<br><br>[1179271433] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_SVC_CHECKS;devstack01;1179271433<br>[1179271440] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_SVC_CHECKS;devstack02;1179271440
<br>[1179271445] HOST ALERT: devstack01;DOWN;SOFT;1;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271448] HOST ALERT: devstack01;DOWN;SOFT;2;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">
10.0.0.160</a>)<br>[1179271451] HOST ALERT: devstack01;DOWN;SOFT;3;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271451] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_SVC_CHECKS;ilom-cp1;1179271449
<br>[1179271454] HOST ALERT: devstack01;DOWN;SOFT;4;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271457] HOST ALERT: devstack01;DOWN;SOFT;5;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">
10.0.0.160</a>)<br>[1179271460] HOST ALERT: devstack01;DOWN;SOFT;6;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271460] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_SVC_CHECKS;ilom-cp2;1179271459
<br>[1179271463] HOST ALERT: devstack01;DOWN;SOFT;7;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271466] HOST ALERT: devstack01;DOWN;SOFT;8;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">
10.0.0.160</a>)<br>[1179271469] HOST ALERT: devstack01;DOWN;SOFT;9;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271469] EXTERNAL COMMAND: SCHEDULE_FORCED_HOST_SVC_CHECKS;ilom-cp3;1179271467
<br>[1179271472] HOST ALERT: devstack01;DOWN;HARD;10;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271472] HOST NOTIFICATION: lbeavers-pager;devstack01;DOWN;host-notify-by-epager;CRITICAL - Host Unreachable (
<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271472] HOST NOTIFICATION: lbeavers;devstack01;DOWN;host-notify-by-email;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271472] HOST NOTIFICATION: gpoly-pager;devstack01;DOWN;host-notify-by-epager;CRITICAL - Host Unreachable (
<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271472] HOST NOTIFICATION: gpoly;devstack01;DOWN;host-notify-by-email;CRITICAL - Host Unreachable (<a href="http://10.0.0.160">10.0.0.160</a>)<br>[1179271472] SERVICE ALERT: devstack01;ping;CRITICAL;HARD;1;CRITICAL - Host Unreachable (
<a href="http://10.0.0.160">10.0.0.160</a>)<br><br clear="all">-------------------------------------------<br>As you can see, host notifications are being sent out<br><br>Today's log:<br><br>---------------------------------------------------
<br>[1179337965] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;contactpoint3;var_disk;1179337960<br>[1179337974] SERVICE ALERT: contactpoint3;var_disk;UNKNOWN;SOFT;1;SNMP problem - No data received from host<br>[1179338034] SERVICE ALERT: contactpoint3;var_disk;UNKNOWN;SOFT;2;SNMP problem - No data received from host
<br>[1179338094] SERVICE ALERT: contactpoint3;var_disk;UNKNOWN;HARD;3;SNMP problem - No data received from host<br>[1179338408] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;contactpoint3;sendmail_check;1179338407<br>[1179338414] SERVICE ALERT: contactpoint3;sendmail_check;CRITICAL;SOFT;1;sendmail Processes CRITICAL - *0*
<br>[1179338474] SERVICE ALERT: contactpoint3;sendmail_check;CRITICAL;SOFT;2;sendmail Processes CRITICAL - *0*<br>[1179338484] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;contactpoint3;sendmail_check;1179338481<br>[1179338494] SERVICE ALERT: contactpoint3;sendmail_check;CRITICAL;HARD;3;sendmail Processes CRITICAL - *0*
<br>[1179338604] Warning: The results of service 'ping' on host 'contactpoint4' are stale by 45 seconds (threshold=615 seconds). I'm forcing an immediate check of the service.<br>[1179338604] Warning: The results of service 'sendmail_check' on host 'contactpoint4' are stale by 45 seconds (threshold=615 seconds). I'm forcing an immediate check of the service.
<br>[1179338604] Warning: The results of service 'ping' on host 'contactpoint5' are stale by 45 seconds (threshold=61<br><br>--------------------------<br><br>As can be seen, it went thru the three criticals, went to CRIT HARD, but no NOTIFICATIONS were sent, it just continued looking at other services.
<br><br><br><br>I've got <br>enable_notifications=1 set in nagios.cfg<br>In services.cfg, I've got:<br>notification_period 24x7<br>
notifications_enabled 1 ; Service notifications are enabled<br>
notification_interval 15 ; Default interval - change only if needed in the service config<br><br>and the web frontend reports ALL notifications enabled.<br><br><br><table class="tac" border="0" cellpadding="0" cellspacing="4">
<tbody><tr><td colspan="5" class="featureTitle" height="20"> Monitoring Features</td></tr>
<tr>
<td class="featureHeader" width="135">Flap Detection</td>
<td class="featureHeader" width="135">Notifications</td>
<td class="featureHeader" width="135">Event Handlers</td>
<td class="featureHeader" width="135">Active Checks</td>
<td class="featureHeader" width="135">Passive Checks</td>
</tr>
<tr>
<td valign="top">
<table border="0" cellpadding="0" cellspacing="0" width="135">
<tbody><tr>
<td valign="top"><a href="http://nagios.quepasa.com/nagios/cgi-bin/cmd.cgi?cmd_typ=62"><img src="http://nagios.quepasa.com/nagios/images/tacenabled.png" alt="Flap Detection Enabled" title="Flap Detection Enabled" border="0">
</a></td>
<td width="10"> </td>
<td class="featureEnabledFlapDetection" valign="top" width="100%">
<table border="0" width="100%">
<tbody><tr><td class="featureItemEnabledServiceFlapDetection" width="100%">All Services Enabled</td></tr>
<tr><td class="featureItemServicesNotFlapping" width="100%">No Services Flapping</td></tr>
<tr><td class="featureItemEnabledHostFlapDetection" width="100%">All Hosts Enabled</td></tr>
<tr><td class="featureItemHostsNotFlapping" width="100%">No Hosts Flapping</td></tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
<td valign="top">
<table border="0" cellpadding="0" cellspacing="0" width="135">
<tbody><tr>
<td valign="top"><a href="http://nagios.quepasa.com/nagios/cgi-bin/cmd.cgi?cmd_typ=11"><img src="http://nagios.quepasa.com/nagios/images/tacenabled.png" alt="Notifications Enabled" title="Notifications Enabled" border="0">
</a></td>
<td width="10"> </td>
<td class="featureEnabledNotifications" valign="top" width="100%">
<table border="0" width="100%">
<tbody><tr><td class="featureItemEnabledServiceNotifications" width="100%">All Services Enabled</td></tr>
<tr><td class="featureItemEnabledHostNotifications" width="100%">All Hosts Enabled</td></tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
<td valign="top">
<table border="0" cellpadding="0" cellspacing="0" width="135">
<tbody><tr>
<td valign="top"><a href="http://nagios.quepasa.com/nagios/cgi-bin/cmd.cgi?cmd_typ=42"><img src="http://nagios.quepasa.com/nagios/images/tacenabled.png" alt="Event Handlers Enabled" title="Event Handlers Enabled" border="0">
</a></td>
<td width="10"> </td>
<td class="featureEnabledHandlers" valign="top" width="100%">
<table border="0" width="100%">
<tbody><tr><td class="featureItemEnabledServiceHandlers" width="100%">All Services Enabled</td></tr>
<tr><td class="featureItemEnabledHostHandlers" width="100%">All Hosts Enabled</td></tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
<td valign="top">
<table border="0" cellpadding="0" cellspacing="0" width="135">
<tbody><tr>
<td valign="top"><a href="http://nagios.quepasa.com/nagios/cgi-bin/extinfo.cgi?type=0"><img src="http://nagios.quepasa.com/nagios/images/tacenabled.png" alt="Active Checks Enabled" title="Active Checks Enabled" border="0">
</a></td>
<td width="10"> </td>
<td class="featureEnabledActiveChecks" valign="top" width="100%">
<table border="0" width="100%">
<tbody><tr><td class="featureItemEnabledActiveServiceChecks" width="100%">All Services Enabled</td></tr>
<tr><td class="featureItemEnabledActiveHostChecks" width="100%">All Hosts Enabled</td></tr>
</tbody></table>
</td>
</tr>
</tbody></table>
</td>
<td valign="top">
<table border="0" cellpadding="0" cellspacing="0" width="135">
<tbody><tr>
<td valign="top"><a href="http://nagios.quepasa.com/nagios/cgi-bin/extinfo.cgi?type=0"><img src="http://nagios.quepasa.com/nagios/images/tacenabled.png" alt="Passive Checks Enabled" title="Passive Checks Enabled" border="0">
</a></td>
<td width="10"> </td>
<td class="featureEnabledPassiveChecks" valign="top" width="100%">
<table border="0" width="100%">
<tbody><tr><td class="featureItemEnabledPassiveServiceChecks" width="100%">All Services Enabled</td></tr>
<tr><td class="featureItemEnabledPassiveHostChecks" width="100%">All Hosts Enabled</td></tr></tbody></table></td></tr></tbody></table></td></tr></tbody></table><br>Any idea where else to check???? I've deleted my retention file and restarted nagios as well
<br>
Pulling my hair out here<br><br>G.~<br><br><br>-- <br>Gary Every<br>"Pay it Forward!"<br>