Passive service on Windows 2003 box notifies intermittently
C. Bensend
benny at bennyvision.com
Thu Nov 19 15:50:04 CET 2009
Hey folks,
I'm working on ironing out Windows event log alerting for our
eleventy billion Windows hosts, and they're slowly but surely
driving me insane.
I am using Steve Shipway's Nagios EventLog Agent, as I need the
end users to be able to add/edit/remove their own alerts as they
see fit. However, *I* am having a helluva time getting this all
working together. Sorry for the length of this email, I've
included a metric buttload of data.
I have the following service definition on the Nagios host (from
objects.cache):
define service {
host_name winhost
service_description System EventLog
check_period 24x7 passive checks
check_command check_passive_service!0!No critical system events
contact_groups testing-admins
notification_period 24x7 passive checks
initial_state o
check_interval 5.000000
retry_interval 2.000000
max_check_attempts 1
is_volatile 0
parallelize_check 1
active_checks_enabled 0
passive_checks_enabled 1
obsess_over_service 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options o,w,u,c
freshness_threshold 14400
check_freshness 1
notification_options u,w,c,r
notifications_enabled 1
notification_interval 360.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
}
The check_passive_service command is defined as such:
define command {
command_name check_passive_service
command_line $USER1$/check_dummy $ARG1$ "$ARG2$"
}
The "24x7 passive checks" timeperiod is defined as such:
define timeperiod {
timeperiod_name 24x7 passive checks
alias 24x7 passive checks - single alert notifies
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
The testing-admins contact group is defined as such:
define contactgroup {
contactgroup_name testing-admins
alias Bensend testing group
members cbensend
}
On the Windows side, I have a EventLog Agent alert set up like so:
Name: User Initiated System Reboot
Event Log to Check: System
Which Events to Alert: Information, Warning, Error
Match String: has initiated the restart of computer HOSTNAME
Service Name: System EventLog
Service Status: (2) Critical
The Agent and NSCA are communicating fine, I get a notification
each time I restart the agent. However, the System EventLog alert
matches the regexp string above, but does not notify. After resetting
all passive services so they are in an OK state, here are the log
entries from the Nagios side when I reboot the Windows machine with
my explanations and comments (please pardon the crappy line wrapping):
Nov 19 08:20:05 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;EventLog Agent;1;HEARTBEAT [WARN
#1]: Service starting
-- OK, that's the Nagios EventLog Agent starting.
Nov 19 08:20:06 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process Explorer.EXE has initiated the restart of
computer WINHOST on behalf of user DOMAIN\me for the following reason:
Application: Maintenance (Planned) Reason Code: 0x84040001 Shutdown
Type: restart Commen
-- That is the passive event coming in from NSCA, so the Agent is
working and communicating with NSCA just fine
Nov 19 08:20:06 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process svchost.exe has initiated the restart of
computer WINHOST on behalf of user NT AUTHORITY\SYSTEM for the following
reason: No title for this reason could be found Reason Code: 0x80070020
Shutdown Type: restart
-- Ditto here
Nov 19 08:20:12 hostname nagios: PASSIVE SERVICE CHECK: winhost;EventLog
Agent;1;HEARTBEAT [WARN #1]: Service starting
-- I believe that's Nagios picking up the passive service check data
from the named pipe
Nov 19 08:20:12 hostname nagios: SERVICE ALERT: winhost;EventLog
Agent;WARNING;HARD;1;HEARTBEAT [WARN #1]: Service starting
-- Nagios generates a service alert for the agent
Nov 19 08:20:12 hostname nagios: SERVICE NOTIFICATION: me;winhost;EventLog
Agent;WARNING;notify-service-by-email;HEARTBEAT [WARN #1]: Service
starting
-- Yay, Nagios notifies me via email because the Nagios EventLog
Agent has started up
Nov 19 08:20:12 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process svchost.exe has
initiated the restart of computer WINHOST on behalf of user NT
AUTHORITY\SYSTEM for the following reason: No title for this reason could
be found Reason Code: 0x80070020 Shutdown Type: restart
-- Nagios picking up the passive service data from the named pipe?
Nov 19 08:20:12 hostname nagios: SERVICE ALERT: winhost;System
EventLog;CRITICAL;HARD;1;System [info] [USER32 #1074]: The process
svchost.exe has initiated the restart of computer WINHOST on behalf of
user NT AUTHORITY\SYSTEM for the following reason: No title for this
reason could be found Reason Code: 0x80070020 Shutdown Type: restart
-- OK, Nagios generates a service alert here. Yay. But ...
Nov 19 08:20:12 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process Explorer.EXE has
initiated the restart of computer WINHOST on behalf of user DOMAIN\me for
the following reason: Application: Maintenance (Planned) Reason Code:
0x84040001 Shutdown Type: restart Commen
That's it. No notification. No nothing else, and I didn't skip
any log entries other than one of the NSClient++ services getting a
connection refused while the host was rebooting.
And what makes this worse is that it's not consistent - I get the
entries from NSCA every time, but I only get the notifications SOME
of the time. Here is one that *did* work:
Nov 19 08:39:24 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;EventLog Agent;1;HEARTBEAT [WARN
#1]: Service starting
-- OK, again, the passive EventLog Agent service starts
Nov 19 08:39:24 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process Explorer.EXE has initiated the restart of
computer WINHOST on behalf of user DOMAIN\Me for the following reason:
Application: Maintenance (Planned) Reason Code: 0x84040001 Shutdown
Type: restart Commen
-- The agent kicks in, and sends the desired alert to NSCA
Nov 19 08:39:25 hostname nagios: EXTERNAL COMMAND:
PROCESS_SERVICE_CHECK_RESULT;winhost;System EventLog;2;System [info]
[USER32 #1074]: The process svchost.exe has initiated the restart of
computer WINHOST on behalf of user NT AUTHORITY\SYSTEM for the following
reason: No title for this reason could be found Reason Code: 0x80070020
Shutdown Type: restart
-- Nagios notices
Nov 19 08:39:32 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process Explorer.EXE has
initiated the restart of computer WINHOST on behalf of user DOMAIN\Me for
the following reason: Application: Maintenance (Planned) Reason Code:
0x84040001 Shutdown Type: restart Commen
-- Ditto
Nov 19 08:39:32 hostname nagios: SERVICE ALERT: winhost;System
EventLog;CRITICAL;HARD;1;System [info] [USER32 #1074]: The process
Explorer.EXE has initiated the restart of computer WINHOST on behalf of
user DOMAIN\Me for the following reason: Application: Maintenance
(Planned) Reason Code: 0x84040001 Shutdown Type: restart Commen
-- Nagios generates a service alert
Nov 19 08:39:32 hostname nagios: SERVICE NOTIFICATION:
cbensend;winhost;System EventLog;CRITICAL;notify-service-by-email;System
[info] [USER32 #1074]: The process Explorer.EXE has initiated the restart
of computer WINHOST on behalf of user DOMAIN\Me for the following reason:
Application: Maintenance (Planned) Reason Code: 0x84040001 Shutdown
Type: restart Commen
-- And this time, it generates a service *NOTIFICATION*. Why this time
and not the last?
Nov 19 08:39:32 hostname nagios: PASSIVE SERVICE CHECK: winhost;EventLog
Agent;1;HEARTBEAT [WARN #1]: Service starting
Nov 19 08:39:32 hostname nagios: SERVICE ALERT: winhost;EventLog
Agent;WARNING;HARD;1;HEARTBEAT [WARN #1]: Service starting
Nov 19 08:39:32 hostname nagios: SERVICE NOTIFICATION:
cbensend;winhost;EventLog Agent;WARNING;notify-service-by-email;HEARTBEAT
[WARN #1]: Service starting
Nov 19 08:39:32 hostname nagios: PASSIVE SERVICE CHECK: winhost;System
EventLog;2;System [info] [USER32 #1074]: The process svchost.exe has
initiated the restart of computer WINHOST on behalf of user NT
AUTHORITY\SYSTEM for the following reason: No title for this reason could
be found Reason Code: 0x80070020 Shutdown Type: restart
This is the first time I've done anything with passive service
checks; am I just not understanding something silly? Or .. ? This
is Nagios 3.2.0 running on RHEL 5.4 (built from source), BTW.
Thanks folks,
Benny
--
"It's not all about getting up and putting four slices of kickass
in a two slice toaster." -- ark86, on Fazed.net
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list