freshness check bug?
admin at jpk236.com
admin at jpk236.com
Wed May 11 19:26:59 CEST 2005
Bryan,
You never mentioned, and I forgot to ask. What method are you using to
send the passive checks from the distributed monitored servers to your
central server? NSCA? If so, are those servers configured correctly to
send the data? Is the central server configured correctly to receive
the data?
- Justin Kulikowski
[ http://www.jpk236.com ]
Bryan Loniewski wrote:
> Regardless of what freshness_threshold I pick (as long as it's not too
> unrealistic), I just want clarification if a bug exists? (By the way,
> where do you see the default
> freshness threshold is 300 sec?). Anyway, I increased the threshold just
> now to 180
> seconds and the only thing in my nagios.log was:
>
> [1115831032] Finished daemonizing... (New PID=16154)
> [1115831272] Warning: The results of service 'PROCS-NAGIOS' on host
> 'csstest2' are stale
> by 60 seconds (threshold=180 seconds). I'm forcing an immediate check
> of the service.
>
> So it did not even execute my eventhandler once? I'm getting very
> inconsistent results!
>
> NRPE and check_by_ssh are not acceptable methods for distributed
> monitoring in our
> environment.
>
> Thanks for the comments... Justin
>
> _________________________
> Bryan Loniewski
> Rutgers University
> NBCS - Systems Programmer
>
> On Wed, 11 May 2005, admin at jpk236.com wrote:
>
>> Bryan, A freshness_threshold of 60 seconds might be a little
>> unrealistic. The default value for the threshold is 300 seconds (5
>> minutes).
>> If you want almost real-time stats, which appears to be what
>> you're going for, perhaps you want to try NRPE or check_by_ssh as an
>> alternative method of doing distributed monitoring.
>>
>> - Justin Kulikowski
>> [ http://www.jpk236.com ]
>>
>> Bryan Loniewski wrote:
>>
>>> While trying to setup failover in a distributed environment, I came
>>> across the following
>>> problem (bug?) involving freshness checking.
>>>
>>> Note: The host that this is setup on is NOT receiving any passive
>>> checks while I am
>>> testing the freshness checking.. so the results are always stale
>>> forcing the freshness
>>> check everytime.
>>>
>>> Note2: Relevant config snippets are under my .sig
>>>
>>> Trying to configure (passive) service freshness checking to execute
>>> an eventhandler
>>> works correctly for 1 or 2 iterations.. BUT no more than that. It
>>> seems to stop checking
>>> the freshness after at most 3 iterations and stops executing the
>>> eventhandler after at most 2 iterations. I've replicated this
>>> behavior (too) many times and the results are
>>> inconsistent.
>>>
>>> Below is the output of my nagios log:
>>>
>>> <snip nagios.log>
>>> [1115822708] Finished daemonizing... (New PID=15941)
>>> [1115822828] Warning: The results of service 'PROCS-NAGIOS' on host
>>> 'csstest2' are stale
>>> by 60 seconds (threshold=60 seconds). I'm forcing an immediate check
>>> of the service.
>>> [1115822838] SERVICE ALERT:
>>> csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;CRITICAL
>>> [1115822838] SERVICE EVENT HANDLER:
>>> csstest2;PROCS-NAGIOS;CRITICAL;SOFT;1;slave-failover
>>> [1115822948] Warning: The results of service 'PROCS-NAGIOS' on host
>>> 'csstest2' are stale
>>> by 60 seconds (threshold=60 seconds). I'm forcing an immediate check
>>> of the service.
>>>
>>> Notice the freshness check ran ONLY 2 times when it should have run 5
>>> (if you look at my
>>> config options below) and the eventhandler ran ONLY 1 time, when it
>>> should have ran 3 times.
>>>
>>> Can anyone verify (disprove) this behavior? Am I missing something?
>>>
>>> _________________________
>>> Bryan Loniewski
>>> Rutgers University
>>> NBCS - Systems Programmer
>>>
>>> <snip nagios.cfg>
>>> check_service_freshness=1
>>> service_freshness_check_interval=60
>>> <snip>
>>>
>>> <snip objects.cfg>
>>> define service{
>>> name generic-service
>>> parallelize_check 1
>>> obsess_over_service 1
>>> check_freshness 0
>>> freshness_threshold 60
>>> notifications_enabled 1
>>> event_handler_enabled 1
>>> flap_detection_enabled 1
>>> failure_prediction_enabled 1
>>> process_perf_data 1
>>> retain_status_information 1
>>> retain_nonstatus_information 1
>>> is_volatile 0
>>> max_check_attempts 5
>>> normal_check_interval 2
>>> retry_check_interval 1
>>> check_period 24x7
>>> contact_groups super-admins
>>> notification_interval 3
>>> notification_period 24x7
>>> register 0
>>> }
>>> define service{
>>> use generic-service
>>> name generic-passive-service
>>> active_checks_enabled 0
>>> passive_checks_enabled 1
>>> register 0
>>> }
>>> define service{
>>> use generic-passive-service
>>> host_name csstest2
>>> service_description PROCS-NAGIOS
>>> check_freshness 1
>>> freshness_threshold 60
>>> check_command check_dummy!2
>>> event_handler slave-failover
>>> }
>>> define command{
>>> command_name check_dummy
>>> command_line $USER1$/check_dummy $ARG1$
>>> }
>>> define command{
>>> command_name slave-failover
>>> command_line $USER2$/failover $SERVICESTATE$
>>> $SERVICESTATETYPE$
>>> }
>>> <snip>
>>>
>>>
>>> -------------------------------------------------------
>>> This SF.Net email is sponsored by Oracle Space Sweepstakes
>>> Want to be the first software developer in space?
>>> Enter now for the Oracle Space Sweepstakes!
>>> http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
>>> _______________________________________________
>>> Nagios-devel mailing list
>>> Nagios-devel at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>>
>>
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
More information about the Developers
mailing list