Nagios did not send notification to some contacts --again
Frater, Greg J
GJFRATER at bechtel.com
Wed Sep 1 18:26:36 CEST 2010
Hi All,
We had an event several weeks ago, it has happened again, I'm posted
after the first time and reposting to the same thread because they are
related. Sorry if this makes the it confusing. Marc thanks for the
reply to the first event, see below for my responses.
In the second event a host went down due to a power outage but only a
portion of the contacts were sent notifications (which is the same
problem as the first event). This is the second time, that I know of,
that Nagios has failed to send to some of the contacts. The problem has
now occurred on two different hosts. I can't explain why it's happening
which does not instill confidence in our customers. Any help or
suggestions in fixing this are greatly appreciated.
This next part is from/for the first event
-------------------- first event --------------------------------------
>> There was a routing issue on our WAN that caused this event, the SMTP
server we use is across the WAN. Could the routing issue have prevented
some of the SMTP notifications from being sent, wouldn't they just queue
up and go once the problem was resolved?
>They would be queued by the SMTP server running on your nagios machine.
Redelivery attempts would occur based on the configuration there.
Okay, makes sense.
>> I have seen messages that did not arrive at the recipients phone but
I've never seen Nagios not generate notifications for contacts that are
configured for that host or service. Has anyone else seen this, any
suggestions on a cause or how to troubleshoot?
>- Check nagios.log for a HOST NOTIFICATION event for that group. Make
sure there were no errors logged.
nagios.log only shows notifications sent to some of the contacts, these
notifications were received.
>- Check your local SMTP server logs to see if the messages were
received there and no errors were reported.
Not necessary, nagios did not send the notifications
>- Make sure that nagios has been restarted since adding this group and
contacts.
Done. The contact groups in question have been in place for many
months.
>- Make sure you don't have multiple nagios daemons running at the same
time.
Done. Only a single instance is running.
----------------------- end of first event
---------------------------------
------------------------ Second event with logs and configs
-----------------
Below are the configs for the host from the second event. If you look
the log at the bottom you'll see that 11 of 16 contacts were sent
notifications, some but not all from each of the contact groups
configured. I'm trying to figure out why. Does anyone see a problem
with my configs?
Host in question:
CONFIGS:
define host {
host_name Host_A
alias Host_A
parents Host_B
use upshost
contact_groups +network-email,onguard
register 1
}
define contactgroup {
contactgroup_name network-email
alias Users who monitor the
network - email only
members
netuser1,netuser2,netuser3
}
define contactgroup {
contactgroup_name onguard
alias On Guard Admins
members
og_user1-phone,og_user2-phone,og_user3,og_user3-home,og_user3-phone,og_u
ser4,og_user4-phone,og_user5-phone,og_user6,og_user6-phone,og_user7,og_u
ser7-phone,og_user8
}
define host {
name upshost
alias NetInfra UPS' template
check_command check-host-alive
use generic-pnp,generic-host
max_check_attempts 5
check_interval 60
retry_interval 3
active_checks_enabled 1
passive_checks_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups network
notification_interval 60
notification_period 24x7
notification_options d,u,r
notifications_enabled 1
register 0
}
Excerpt from nagios.log
[1283265540] HOST NOTIFICATION:
netuser2-cell;Host_A;UNREACHABLE;alert-host-by-sms;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
netuser2-pager;Host_A;UNREACHABLE;alert-host-by-modem;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
netuser2;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
og_user8;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
og_user7-phone;Host_A;UNREACHABLE;alert-host-by-sms;PING CRITICAL -
Packet loss = 100%
[1283265540] HOST NOTIFICATION:
og_user7;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user6-phone;Host_A;UNREACHABLE;alert-host-by-email-short;PING
CRITICAL - Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user6;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user4;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user3-home;Host_A;UNREACHABLE;alert-host-by-email-short;PING CRITICAL
- Packet loss = 100%
[1283265541] HOST NOTIFICATION:
og_user3;Host_A;UNREACHABLE;alert-host-by-email-long;PING CRITICAL -
Packet loss = 100%
[1283266180] HOST ALERT: Host_A;UP;HARD;1;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
netuser2-cell;Host_A;UP;alert-host-by-sms;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
netuser2-pager;Host_A;UP;alert-host-by-modem;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
netuser2;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266180] HOST NOTIFICATION:
og_user8;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user7-phone;Host_A;UP;alert-host-by-sms;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user7;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user6-phone;Host_A;UP;alert-host-by-email-short;PING OK - Packet loss
= 0%, RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user6;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266181] HOST NOTIFICATION:
og_user4;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
[1283266182] HOST NOTIFICATION:
og_user3-home;Host_A;UP;alert-host-by-email-short;PING OK - Packet loss
= 0%, RTA = 0.50 ms
[1283266182] HOST NOTIFICATION:
og_user3;Host_A;UP;alert-host-by-email-long;PING OK - Packet loss = 0%,
RTA = 0.50 ms
--------------------- end of second event
-------------------------------------------
------------------------------------------------------------------------------
This SF.net Dev2Dev email is sponsored by:
Show off your parallel programming skills.
Enter the Intel(R) Threading Challenge 2010.
http://p.sf.net/sfu/intel-thread-sfd
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list