freshness check bug?
Bryan Loniewski
brylon at jla.rutgers.edu
Wed May 11 18:31:46 CEST 2005
While trying to setup failover in a distributed environment, I came across the following
problem (bug?) involving freshness checking.
Note: The host that this is setup on is NOT receiving any passive checks while I am
testing the freshness checking.. so the results are always stale forcing the freshness
check everytime.
Note2: Relevant config snippets are under my .sig
Trying to configure (passive) service freshness checking to execute an eventhandler
works correctly for 1 or 2 iterations.. BUT no more than that. It seems to stop checking
the freshness after at most 3 iterations and stops executing the eventhandler after
at most 2 iterations. I've replicated this behavior (too) many times and the results are
inconsistent.
Below is the output of my nagios log:
<snip nagios.log>
[1115822708] Finished daemonizing... (New PID=15941)
[1115822828] Warning: The results of service 'PROCS-NAGIOS' on host 'csstest2' are stale
by 60 seconds (threshold=60 seconds). I'm forcing an immediate check of the service.
[1115822838] SERVICE ALERT: csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;CRITICAL
[1115822838] SERVICE EVENT HANDLER: csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;slave-failover
[1115822948] Warning: The results of service 'PROCS-NAGIOS' on host 'csstest2' are stale
by 60 seconds (threshold=60 seconds). I'm forcing an immediate check of the service.
Notice the freshness check ran ONLY 2 times when it should have run 5 (if you look at my
config options below) and the eventhandler ran ONLY 1 time, when it should have ran 3
times.
Can anyone verify (disprove) this behavior? Am I missing something?
_________________________
Bryan Loniewski
Rutgers University
NBCS - Systems Programmer
<snip nagios.cfg>
check_service_freshness=1
service_freshness_check_interval=60
<snip>
<snip objects.cfg>
define service{
name generic-service
parallelize_check 1
obsess_over_service 1
check_freshness 0
freshness_threshold 60
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
max_check_attempts 5
normal_check_interval 2
retry_check_interval 1
check_period 24x7
contact_groups super-admins
notification_interval 3
notification_period 24x7
register 0
}
define service{
use generic-service
name generic-passive-service
active_checks_enabled 0
passive_checks_enabled 1
register 0
}
define service{
use generic-passive-service
host_name csstest2
service_description PROCS-NAGIOS
check_freshness 1
freshness_threshold 60
check_command check_dummy!2
event_handler slave-failover
}
define command{
command_name check_dummy
command_line $USER1$/check_dummy $ARG1$
}
define command{
command_name slave-failover
command_line $USER2$/failover $SERVICESTATE$ $SERVICESTATETYPE$
}
<snip>
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
More information about the Developers
mailing list