eventhandlers running when a dependent service dependency is not satisfied
Eli Stair
estair at ilm.com
Fri Dec 9 06:14:36 CET 2005
I'm not entirely sure I am configuring this properly to achieve my goal,
so I'll state this shortly and then give the details below. The
question comes down to this:
Should a failed service check for a dependent trigger a check of its
parent before continuing? If this is not the case, or default, is there
_ANY_ way to implement this?
I want to avoid at all costs having an every-minute check of the parent
processes on many thousand hosts just to keep from having the child
process checks and event handlers going hay-wire.
I want a dependency chain like this:
SSH -- SNMP --\
- Ganglia
- NTP
I believe I have this set up so that a service check for SNMP is
dependent on the SSH service running. In turn, the service checks for
other processes that use SNMP are dependent on SNMP running. My intent
is that service checks for NTP,etc will not be attempted if its parent
SNMP process is not in an OK state (as I have an event handler that will
restart SNMP if it is dead). If the parent SNMP _IS_ running, then the
child process checks (Ganglia, NTP, etc) will be checked and if dead
their own event handler will activate.
The problem is that in this case, if I kill off SNMP the child process
checks STILL execute and return a CRITICAL. As a result, nagios fires
off the event handler for all these checks which results in an SSH out
to the nodes in question and restarting a bunch of services that are
probably still running. It SHOULD NOT schedule the child checks and
thus not run their event handlers until AFTER a new parent check has
returned executed and returned successfully, correct?
I've included a dependency example below, and a snip from the nagios log
showing it sequentially hammering out checks of all the child processes
at the same time it already knows the parent is dead.
My apologies for the lengthy post, but I believe I've covered this from
every angle and posted enough info up front to make it easily parseable.
Thanks for any help in this, even if it's just a statement that I'm
wrong, and I have to do this a different way.
Cheers,
/eli
###################################################
### snip of this host/group definition include:
define host{
use linux-node-production
host_name HOSTNAME1
address IP
}
define servicedependency{
host_name HOSTNAME1
service_description SSH
dependent_host_name HOSTNAME1
dependent_service_description SNMP
execution_failure_criteria w,p,u,c
notification_failure_criteria w,p,u,c
inherits_parent 1
}
define servicedependency{
host_name HOSTNAME1
service_description SNMP
dependent_host_name HOSTNAME1
dependent_service_description SNMP--*
execution_failure_criteria w,p,u,c
notification_failure_criteria w,p,u,c
inherits_parent 1
}
define service{
use generic-service
hostgroup_name HOSTGROUP1
service_description SNMP
check_command SNMPCHECKCOMMAND
event_handler
restart-by-ssh!/etc/init.d/snmpd!restart
normal_check_interval 30
}
define service{
use generic-service
hostgroup_name HOSTGROUP1
service_description SNMP-- NTP running
check_command SNMPCHECKCOMMAND
event_handler
restart-by-ssh!/etc/init.d/xntpd!restart
normal_check_interval 240
}
###################################################
[1134102595] SERVICE ALERT: HOSTNAME1001;SNMP-- cron
running;CRITICAL;SOFT;1;No process matching cron found : CRITICAL
[1134102595] SERVICE EVENT HANDLER: HOSTNAME1001;SNMP-- cron
running;CRITICAL;SOFT;1;restart-by-ssh!/etc/init.d/cron!restart
[1134102655] SERVICE ALERT: HOSTNAME1001;SNMP-- cron
running;CRITICAL;SOFT;2;No process matching cron found : CRITICAL
[1134102655] SERVICE EVENT HANDLER: HOSTNAME1001;SNMP-- cron
running;CRITICAL;SOFT;2;restart-by-ssh!/etc/init.d/cron!restart
[1134102715] SERVICE ALERT: HOSTNAME1001;SNMP-- cron
running;CRITICAL;SOFT;3;No process matching cron found : CRITICAL
[1134102715] SERVICE EVENT HANDLER: HOSTNAME1001;SNMP-- cron
running;CRITICAL;SOFT;3;restart-by-ssh!/etc/init.d/cron!restart
[1134102775] SERVICE ALERT: HOSTNAME1001;SNMP-- cron
running;OK;SOFT;4;(No output returned from plugin)
[1134102775] SERVICE EVENT HANDLER: HOSTNAME1001;SNMP-- cron
running;OK;SOFT;4;restart-by-ssh!/etc/init.d/cron!restart
[1134104099] EXTERNAL COMMAND:
SCHEDULE_FORCED_SVC_CHECK;HOSTNAME1001;SNMP-- Ganglia running;1134104073
[1134104476] SERVICE ALERT: HOSTNAME1001;SNMP-- Ganglia
running;UNKNOWN;SOFT;1;ERROR: Process name table : No response from
remote host '10.65.29.1'.
[1134104476] SERVICE EVENT HANDLER: HOSTNAME1001;SNMP-- Ganglia
running;UNKNOWN;SOFT;1;restart-by-ssh!/etc/init.d/gmond!restart
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list