Nagios did not send notification to some contacts --again

Frater, Greg J GJFRATER at bechtel.com
Wed Sep 1 18:26:36 CEST 2010


Hi All,

We had an event several weeks ago, it has happened again, I'm posted
after the first time and reposting to the same thread because they are
related.  Sorry if this makes the it confusing.  Marc thanks for the
reply to the first event, see below for my responses.

In the second event a host went down due to a power outage but only a
portion of the contacts were sent notifications (which is the same
problem as the first event).  This is the second time, that I know of,
that Nagios has failed to send to some of the contacts.  The problem has
now occurred on two different hosts.  I can't explain why it's happening
which does not instill confidence in our customers.  Any help or
suggestions in fixing this are greatly appreciated.

This next part is from/for the first event

-------------------- first event --------------------------------------
>> There was a routing issue on our WAN that caused this event, the SMTP
server we use is across the WAN.  Could the routing issue have prevented
some of the SMTP notifications from being sent, wouldn't they just queue
up and go once the problem was resolved?

>They would be queued by the SMTP server running on your nagios machine.
Redelivery attempts would occur based on the configuration there.

Okay, makes sense.

>>  I have seen messages that did not arrive at the recipients phone but
I've never seen Nagios not generate notifications for contacts that are
configured for that host or service.  Has anyone else seen this, any
suggestions on a cause or how to troubleshoot?

>- Check nagios.log for a HOST NOTIFICATION event for that group. Make
sure there were no errors logged. 

nagios.log only shows notifications sent to some of the contacts, these
notifications were received.

>- Check your local SMTP server logs to see if the messages were
received there and no errors were reported.

Not necessary, nagios did not send the notifications

>- Make sure that nagios has been restarted since adding this group and
contacts.

Done.  The contact groups in question have been in place for many
months.

>- Make sure you don't have multiple nagios daemons running at the same
time.

Done. Only a single instance is running.
----------------------- end of first event
---------------------------------


------------------------ Second event with logs and configs
-----------------
Below are the configs for the host from the second event.  If you look
the log at the bottom you'll see that 11 of 16 contacts were sent
notifications, some but not all from each of the contact groups
configured. I'm trying to figure out why.  Does anyone see a problem
with my configs?


Host in question:

CONFIGS:
define host {
        host_name                       Host_A
        alias                           Host_A
        parents                         Host_B
        use                             upshost
        contact_groups                  +network-email,onguard
        register                        1
        }

define contactgroup {
        contactgroup_name                       network-email
        alias                                   Users who monitor the
network - email only
        members
netuser1,netuser2,netuser3
        }

define contactgroup {
        contactgroup_name                       onguard
        alias                                   On Guard Admins
        members
og_user1-phone,og_user2-phone,og_user3,og_user3-home,og_user3-phone,og_u
ser4,og_user4-phone,og_user5-phone,og_user6,og_user6-phone,og_user7,og_u
ser7-phone,og_user8
        }

define host {
       name                                     upshost
       alias                                    NetInfra UPS' template
       check_command                            check-host-alive
       use                                      generic-pnp,generic-host
       max_check_attempts                       5
       check_interval                           60
       retry_interval                           3
       active_checks_enabled                    1
       passive_checks_enabled                   1
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       contact_groups                           network
       notification_interval                    60
       notification_period                      24x7
       notification_options                     d,u,r
       notifications_enabled                    1
       register                                 0

}



Excerpt from nagios.log
[1283265540] HOST NOTIFICATION:
netuser2-cell;Host_A;UNREACHABLE;alert-host-by-sms;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
netuser2-pager;Host_A;UNREACHABLE;alert-host-by-modem;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
netuser2;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
og_user8;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
og_user7-phone;Host_A;UNREACHABLE;alert-host-by-sms;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
og_user7;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user6-phone;Host_A;UNREACHABLE;alert-host-by-email-short;PING
CRITICAL - Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user6;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user4;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user3-home;Host_A;UNREACHABLE;alert-host-by-email-short;PING CRITICAL
- Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user3;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283266180] HOST ALERT: Host_A;UP;HARD;1;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
netuser2-cell;Host_A;UP;alert-host-by-sms;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
netuser2-pager;Host_A;UP;alert-host-by-modem;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
netuser2;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
og_user8;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user7-phone;Host_A;UP;alert-host-by-sms;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user7;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user6-phone;Host_A;UP;alert-host-by-email-short;PING OK - Packet loss
= 0%, RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user6;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user4;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266182] HOST NOTIFICATION:
og_user3-home;Host_A;UP;alert-host-by-email-short;PING OK - Packet loss
= 0%, RTA = 0.50 ms
[1283266182] HOST NOTIFICATION:
og_user3;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms

--------------------- end of second event
-------------------------------------------


------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:

Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list