Guys<br><br>I'm testing nagios 3.0a and I'm thinking there is a notification bug.<br><br>I have the following config:<br><br>define timeperiod{<br> timeperiod_name 24x7<br> alias 24 Hours A Day, 7 Days A Week
<br> sunday 00:00-24:00<br> monday 00:00-24:00<br> tuesday 00:00-24:00<br> wednesday 00:00-24:00<br> thursday 00:00-24:00<br> friday 00:00-24:00
<br> saturday 00:00-24:00<br> }<br><br>define contact{<br> name generic-contact ; The name of this contact template<br> service_notification_period 24x7 ; service notifications can be sent anytime
<br> host_notification_period 24x7 ; host notifications can be sent anytime<br> service_notification_options w,u,c,r,f,s ; send notifications for all service states, flapping events, and scheduled downtime events
<br> host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events<br> service_notification_commands notify-service-by-email ; send service notifications via email
<br> host_notification_commands notify-host-by-email ; send host notifications via email<br> register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
<br> }<br><br>define contact{<br> contact_name astuck<br> use generic-contact<br> alias SysAdmin1<br> email &
nbsp; {my email}
<br> }<br clear="all"><br>define contactgroup{<br> contactgroup_name admins<br> alias SysAdmins<br> members astuck<br> }<br><br>define host{<br> name generic-host ; The name of this host template
<br> notifications_enabled 1 ; Host notifications are enabled<br> event_handler_enabled 1 ; Host event handler is enabled<br> flap_detection_enabled 1 ; Flap detection is enabled
<br> failure_prediction_enabled 1 ; Failure prediction is enabled<br> process_perf_data 1 ; Process performance data<br> retain_status_information 1 ; Retain status information across program restarts
<br> retain_nonstatus_information 1 ; Retain non-status information across program restarts<br> notification_period 24x7 ; Send host notifications at any time<br> register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
<br> }<br><br>define host{<br> name generic-linux<br> use generic-host<br> check_period 24x7<br> check_interval 5
<br> retry_interval 1<br> max_check_attempts 10<br> check_command check-host-alive<br> notification_interval 120<br> notification_options d,u,r
<br> register 0<br> }<br><br>define host{<br> name nonprod<br> use generic-linux<br> contact_groups admins
<br> register 0<br> }<br><br>define host{<br> use nonprod<br> host_name lithium<br> alias Oracle Dev 2<br> address
lithium
<br> }<br><br>As far as I see it I should get all host/service notification 24/7. However, when I reboot 'lithium' I get a host down notification but when it comes back<br>I don't get anything.<br>I turned on notification debugging :
<br><br>[1181695731.149796:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Wed Dec 31 16:00:00 1969<br>[1181695731.149852:032.0] Notification viability test passed.
<br>[1181695731.149861:032.1] Current notification number: 1<br>[1181695731.149867:032.2] Creating list of contacts to be notified.<br>[1181695731.149873:032.1] Host notification will NOT be escalated.<br>[1181695731.149879
:032.2] Adding contact 'astuck' to notification list.<br>[1181695731.149985:032.2] ** Attempting to notifying contact 'astuck'...<br>[1181695731.149994:032.2] ** Checking host notification viability for contact 'astuck'...
<br>[1181695731.150005:032.2] ** Host notification viability for contact 'astuck' PASSED.<br>[1181695731.150014:032.2] ** Notifying contact 'astuck'<br>[1181695731.150071:032.2] Raw Command: /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
<br>[1181695731.150078:032.2] Processed Command: /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: PROBLEM\nHost: lithium\nState: DOWN\nAddress: lithium\nInfo: (No output returned from host check)\n\nDate/Time: Tue Jun 12 17:48:51 PDT 2007\n" | /bin/mail -s "** PROBLEM Host Alert: lithium is DOWN **" {my email}
<br>[1181695731.194505:032.0] No contacts were notified. Next possible notification time: Tue Jun 12 19:48:51 2007<br>[1181695731.194527:032.0] 1 contacts were notified.[1181695741.047809:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
<br>[1181695741.047834:032.1] Its not yet time to re-notify the contacts about this host problem...<br>[1181695741.047843:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695741.047850:032.0] Notification viability test failed. No notification will be sent out.
<br>[1181695751.160027:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007<br>[1181695751.160058:032.1] Its not yet time to re-notify the contacts about this host problem...
<br>[1181695751.160068:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695751.160074:032.0] Notification viability test failed. No notification will be sent out.<br>[1181695811.210449:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
<br>[1181695811.210479:032.1] Its not yet time to re-notify the contacts about this host problem...<br>[1181695811.210489:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695811.210495:032.0] Notification viability test failed. No notification will be sent out.
<br>[1181695821.068538:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007<br>[1181695821.068569:032.1] Its not yet time to re-notify the contacts about this host problem...
<br>[1181695821.068580:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695821.068586:032.0] Notification viability test failed. No notification will be sent out.<br>[1181695821.068895:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
<br>[1181695821.068915:032.1] Its not yet time to re-notify the contacts about this host problem...<br>[1181695821.068924:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695821.068931:032.0] Notification viability test failed. No notification will be sent out.
<br>[1181695831.174383:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007<br>[1181695831.174418:032.1] Its not yet time to re-notify the contacts about this host problem...
<br>[1181695831.174427:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695831.174434:032.0] Notification viability test failed. No notification will be sent out.<br>[1181695831.174731:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007
<br>[1181695831.174745:032.1] Its not yet time to re-notify the contacts about this host problem...<br>[1181695831.174754:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695831.174760:032.0] Notification viability test failed. No notification will be sent out.
<br>[1181695851.144314:032.0] ** Host Notification Attempt ** Host: 'lithium', Type: 0, Current State: 1, Last Notification: Tue Jun 12 17:48:51 2007<br>[1181695851.144338:032.1] Its not yet time to re-notify the contacts about this host problem...
<br>[1181695851.144347:032.1] Next acceptable notification time: Tue Jun 12 19:48:51 2007<br>[1181695851.144354:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696025.034559:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'DISK USAGE /tmp', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696025.034582:032.1] We shouldn't notify about this recovery.<br>[1181696025.034589:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696031.130428:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'LOAD', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696031.130452:032.1] We shouldn't notify about this recovery.<br>[1181696031.130460:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696031.131081:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'DISK USAGE /usr/local', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696031.131095:032.1] We shouldn't notify about this recovery.<br>[1181696031.131102:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696111.052735:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'CFENVD', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696111.052759:032.1] We shouldn't notify about this recovery.<br>[1181696111.052766:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696111.052971:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'PERC CONTROLLER', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696111.052984:032.1] We shouldn't notify about this recovery.<br>[1181696111.052992:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696111.053334:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'CFEXECD', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696111.053348:032.1] We shouldn't notify about this recovery.<br>[1181696111.053355:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696121.163710:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'MEM', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696121.163738:032.1] We shouldn't notify about this recovery.<br>[1181696121.163746:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696121.163984:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'DISK USAGE /var', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696121.163998:032.1] We shouldn't notify about this recovery.<br>[1181696121.164005:032.0] Notification viability test failed. No notification will be sent out.<br>[1181696141.130999:032.0] ** Service Notification Attempt ** Host: 'lithium', Service: 'DISK USAGE /', Type: 0, Current State: 0, Last Notification: Wed Dec 31 16:00:00 1969
<br>[1181696141.131023:032.1] We shouldn't notify about this recovery.<br>[1181696141.131031:032.0] Notification viability test failed. No notification will be sent out.<br><br>Clearly, nagios decided that I shouldn't get a host up notification. I just don't understand why. From the log files I'd say the following logic takes place :
<br><br>1. Host goes down - service check fails<br>2. Nagios checks to see if host is down - YES<br>3. Because of step 2. no service notifications are sent<br>4. Host down notification is sent instead<br>5. Host comes back
<br>6. Service checks start recovering - no service recovery notification is sent since no service problem notifications were sent in the first place.<br>7. Host is assumed to be up since service is up<br>8. Hence - no host up notification.
<br><br>First I thought my host up notification might not make it through one of the notification filters but according to the log there is NO HOST check after the reboot therefore<br>there is no host notification attempt.
<br>Looks to me like a design bug but I wanna make sure I'm not getting this wrong. It just doesn't make sense to me that I wouldn't be notified<br>about a host coming back. I understand the part about the services.
<br><br>INTERESTING: I have rebooted a few times and it appears that sometimes I do get host up notifications but most of the time I don't so it seems to have to do with<br>when exactly the reboot occurs.<br>Also, I turned off flapping globally but no difference.
<br><br>Anyone seen this behaviour ?<br>-- <br>stucky