event handler runs for timeouts, not state changes, why?
Lewis Getschel
lgetschel at denver.westerngeco.slb.com
Thu Feb 24 17:34:14 CET 2005
All-
I didn't try to do this, and don't even know if nagios is supposed
to be this specific
Short description:
My event handler seems to be called ONLY when an 'error' occurs like
"connection refused" or "CHECK_NRPE: Socket timeout after 25 seconds.".
I know this because the start of the script echos the parameters into a
file.
But when a state change occurs, the handler doesn't echo anything,
which seems to be proof that it isn't being called, even though the
event log says it was called.
Longer description:
I have an event handler defined for a service, the first lines are a set
of echos of the parameters into a text file.
Here is an event that was a timeout and the event handler
(event_diskmail) is called:
[02-24-2005 04:22:16] SERVICE EVENT HANDLER:
dvfs004;linux-fsdisk1;CRITICAL;SOFT;2;event_diskmail
[02-24-2005 04:22:16] SERVICE ALERT:
dvfs004;linux-fsdisk1;CRITICAL;SOFT;2;CHECK_NRPE: Socket timeout after
25 seconds.
and here is the text from the /tmp/nagios_event_debug.txt (echo of $1 - $9)
Thu Feb 24 04:22:16 MST 2005
1 -CRITICAL -SERVICESTATE
2 -SOFT -STATETYPE
3 -2 -SERVICEATTEMPT
4 -dvfs004 -HOSTNAME
5 -linux-fsdisk1 -SERVICEDESC
6-9 -CHECK_NRPE: Socket timeout after -OUTPUT
nagios
----------------
OK, that part's fine, the script checks for the timeout errors and exits
properly.
Here the nagios event log shows that the handler is called for a state
change (and a 'regular' notification was sent):
[02-24-2005 05:42:35] SERVICE EVENT HANDLER:
dvfs001;linux-fsdisk1;WARNING;HARD;4;event_diskmail
[02-24-2005 05:42:34] SERVICE NOTIFICATION:
deop00;dvfs001;linux-fsdisk1;WARNING;notify-by-email;DISK WARNING
[182453808 kB (10%) free on /dev/sdb1]
[02-24-2005 05:42:34] SERVICE ALERT:
dvfs001;linux-fsdisk1;WARNING;HARD;4;DISK WARNING [182453808 kB (10%)
free on /dev/sdb1]
BUT, the debug text file does NOT show it being called at all (no
text/date/parameters are present),
(Don't confuse notification email with my diskmail routine, my diskmail
routine sends mail to the users who are running low on disk space)
Specifics:
Nagios 1.2
1160 hosts, 1396 services
in services.cfg: (I define the service that will use the event handler)
define service{
use linux-service
name linux-fsdisk1
service_description linux-fsdisk1
check_command check_nrpe!check_fsdisk1
event_handler_enabled 1
event_handler event_diskmail
register 0
}
(and I define the event handler itself)
# Service definition for sending email on dvfs00x systems
define command{
command_name event_diskmail
command_line /usr/lib/nagios/plugins/event_handler_diskmail
$SERVICESTATE$ $STATETYPE$ $SERVICEATTEMPT$ $HOSTNAME$ $SERVICEDESC$
$OUTPUT$
}# Service definition
define service{
use generic-service
name linux-service
is_volatile 0
check_period 24x7
max_check_attempts 4
normal_check_interval 20
retry_check_interval 3
contact_groups ops-escalation-group ,
dets-escalation-group
notification_interval 60
notification_period 24x7
notification_options w,c,r
register 0
}
in hosts.cfg (I assign the service to a host system)
# service definition
define service{
use linux-fsdisk1
host_name dvfs001,dvfs002, (and so on for the rest of the servers)
}
Finally, the head of the event_handler_diskmail file: (extra file
comments removed here)
#!/bin/sh
# Whenever I test this, I forget to run as NAGIOS, NOT root or myself,
check it!
CURRENT_USER=`whoami`
if [ "$CURRENT_USER" != "nagios" ] ;
then
echo ========================================== >>
/tmp/nagios_event_debug.txt
echo "_WRONG_ you are running this as $CURRENT_USER , you should be
_nagios_"
echo --- WRONG user. it is not _nagios_ user >>
/tmp/nagios_event_debug.txt
echo `whoami` >> /tmp/nagios_event_debug.txt
echo =========== Bailing out of script!
=============================== >> /tmp/nagios_event_debug.txt
exit 255
fi
# Echo parameters that were passed for debugging purposes.
# echo "-------------------------------------------" >>
/tmp/nagios_event_debug.txt
echo `date` >> /tmp/nagios_event_debug.txt
# echo Passed Parameters are:
echo 1 -$1 -SERVICESTATE >> /tmp/nagios_event_debug.txt
echo 2 -$2 -STATETYPE >> /tmp/nagios_event_debug.txt
echo 3 -$3 -SERVICEATTEMPT >> /tmp/nagios_event_debug.txt
echo 4 -$4 -HOSTNAME >> /tmp/nagios_event_debug.txt
echo 5 -$5 -SERVICEDESC >> /tmp/nagios_event_debug.txt
echo 6-9 -$6 $7 $8 $9 -OUTPUT >> /tmp/nagios_event_debug.txt
echo `whoami` >> /tmp/nagios_event_debug.txt
echo --------------------------------------------- >>
/tmp/nagios_event_debug.txt
(BTW, the script runs perfectly fine if I sudo su nagios, and run the
script manually with the parameters)
HELP, I can't figure out how to get the event handler to be called for
state changes.
Any help would be greatly appreciated. Thanks.
--
Lewis Getschel | Today is done...
WesternGeco | Today was fun...
1625 Broadway | Tomorrow is another one.
Denver, CO 80202 |
Direct Phone - 303-389-4407| -- Dr. Seuss --
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list