host checks not happening after first notification
Terry
td3201 at gmail.com
Sun Apr 10 03:32:11 CEST 2011
On Sat, Apr 9, 2011 at 7:34 PM, Terry <td3201 at gmail.com> wrote:
> On Sat, Apr 9, 2011 at 6:06 PM, Terry <td3201 at gmail.com> wrote:
>> Hello,
>>
>> I am seeing a weird condition where host checks stop after the first
>> notification. Here's the config:
>>
>> execute_host_checks=1
>>
>> define host{
>> name generic-host
>> check_command check-host-alive
>> check_period 24x7
>> notification_interval 30
>> notification_options d,r
>> notifications_enabled 1
>> event_handler_enabled 1
>> flap_detection_enabled 1
>> failure_prediction_enabled 1
>> process_perf_data 1
>> retain_status_information 1
>> retain_nonstatus_information 1
>> register 0
>> }
>> define host{
>> name generic-host-10
>> use generic-host
>> notification_period 24x7
>> check_interval 5
>> retry_interval 1
>> max_check_attempts 3
>> register 0
>> }
>> define host{
>> name foo-10
>> use generic-host-10
>> contact_groups +foo_primary
>> register 0
>> }
>> define host{
>> use foo-10
>> host_name testpage
>> hostgroups windows,vmguest_windows
>> notification_interval 5
>> parents firewall
>> address 10.235.235.235
>> }
>>
>> define hostescalation{
>> hostgroup_name z-allhosts
>> contacts support at foo.com,support-email-critical
>> first_notification 1
>> last_notification 1
>> notification_interval 0
>> escalation_options d
>> }
>> define hostescalation{
>> hostgroup_name z-allhosts
>> contact_groups +foo_secondary
>> first_notification 3
>> last_notification 4
>> notification_interval 30
>> escalation_options d,r
>> }
>> define hostescalation{
>> hostgroup_name z-allhosts
>> contact_groups +foo_tertiary,foo_secondary
>> first_notification 5
>> last_notification 0
>> notification_interval 30
>> escalation_options d,r
>> }
>>
>>
>> Here's a log of the activity. You see the first notification, then nothing
>>
>> [1302388222] HOST ALERT: testpage;DOWN;SOFT;1;CRITICAL - Plugin timed
>> out after 10 seconds
>> [1302388296] HOST ALERT: testpage;DOWN;SOFT;2;PING CRITICAL - Packet loss = 100%
>> [1302388346] SERVICE ALERT: testpage;cpu -
>> nrpe;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after 50 seconds.
>> [1302388376] HOST ALERT: testpage;DOWN;HARD;3;PING CRITICAL - Packet loss = 100%
>> [1302388376] HOST NOTIFICATION:
>> joe-epager;testpage;DOWN;host-notify-by-epager;PING CRITICAL - Packet
>> loss = 100%
>> [1302388376] HOST NOTIFICATION:
>> joe at DOM.COM;testpage;DOWN;host-notify-by-email;PING CRITICAL - Packet
>> loss = 100%
>> [1302388377] HOST NOTIFICATION:
>> support-email-critical;testpage;DOWN;host-notify-by-email;PING
>> CRITICAL - Packet loss = 100%
>> [1302388446] SERVICE ALERT: testpage;disk drives -
>> nrpe;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after 50 seconds.
>> [1302388547] SERVICE ALERT: testpage;memory - page -
>> nrpe;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after 50 seconds.
>> [1302388657] SERVICE ALERT: testpage;memory - physical -
>> nrpe;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after 50 seconds.
>> [1302388757] SERVICE ALERT:
>> testpage;nrpeclient;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after
>> 50 seconds.
>> [1302389057] SERVICE ALERT:
>> testpage;nrpeclient;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after
>> 50 seconds.
>> [1302389357] SERVICE ALERT:
>> testpage;nrpeclient;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after
>> 50 seconds.
>> [1302389658] SERVICE ALERT:
>> testpage;nrpeclient;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after
>> 50 seconds.
>>
>>
>> I appreciate the help.
>>
>
> More info:
>
> [04-09-2011 19:29:17] SERVICE ALERT:
> testpage;nrpeclient;CRITICAL;HARD;1;CHECK_NRPE: Socket timeout after
> 50 seconds.
>
> I get this event every 5 minutes. It's just a service on this box. I
> thought if the host was down, service checks were suppressed. Is that
> not the case?
>
Sorry for continually replying to my own thread. I guess checks are
happening. Notifications are not happening and then of course
escalations are not either.
1. host gets checked 3 times
2. alert gets sent
3. check happens again after 5 minutes
4. Current attempt goes back to 1/3 (HARD state)
5. No notifications thereafter
Confused.
------------------------------------------------------------------------------
Xperia(TM) PLAY
It's a major breakthrough. An authentic gaming
smartphone on the nation's most reliable network.
And it wants your games.
http://p.sf.net/sfu/verizon-sfdev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list