Verion 2.0b4 how does cgi's nagios_check_command work?
John P. Rouillard
rouilj at cs.umb.edu
Thu Oct 13 16:40:27 CEST 2005
In message <434D5AF2.7010504 at op5.se>,
Andreas Ericsson writes:
>John P. Rouillard wrote:
>> In message <43467920.4070508 at op5.se>,
>> Andreas Ericsson writes:
>>>John P. Rouillard wrote:
>>>
>>>>In message <43465AB9.6020304 at op5.se>,
>>>>Andreas Ericsson writes:
>>>>
>>>>>John P. Rouillard wrote:
>>
>>
>>>>>>The reason I ask is
>>>>>>that nagios was down and the cgi's all happily reported that it was
>>>>>>up. Could this be because the host and service status files were
>>>>>>available since the machine crashed?
>>>>>
>>>>>Yes, that's almost certainly it. There is no really good way of
>>>>>detecting that nagios is actually running unless you're logged in as
>>>>>root.
>>>>
>>>>Hmm, I am not sure I follow why you need to be logged in as root.
>>>
>>>Because otherwise you shouldn't have access to reading process
>>>information about another users process.
>>>
>>>>Why not stat the status.log file and check to see if its (mtime)
>>>>timestamp is less than the setting of:
>>>>
>>>> status_update_interval*2
>>>>
>>>>if aggregate_status_updates is enabled? One could also allow a setting
>>>>"freshness_threshold" in cgi.cfg that is the number of seconds/minutes
>>>>old the status.dat file is allowed to be if aggregate_status_updates
>>>>isn't set.
>>>
>>>Good idea. Write the code for it and submit a patch.
>>
>> Actually not so much a good idea. There is actully a creation
>> datestamp in the status.dat file I was going to use, but I decided to
>> run an experiment first. I have my status_update_interval set to 3
>> seconds.
>>
>> I used check_fileage to warn me if the file's age was over 3 seconds
>> and ran it in a while loop. It failed often. The longest interval was
>> 139 seconds between updates with a number of periods of 20-30 seconds.
>>
>> My guesses are: nagios only writes the status file when it needs to.
>
>This is correct. The status_update_interval is never checked, although
>the status is updated every time a service changes either state or
>output (or a host, for that matter).
Ideally nagios would provide a next_check_time in the status.dat, but
I wonder if that could be usefully intuited from:
min(
min(next_check time on services) + service_check_timeout),
min(next_check time on hosts) + host_check_timeout)
)
Possible problems: on demand host checks (if part of a network is
down) could screw up the timing since everything else stops.
Just because a service check is scheduled doesn't mean that it is
going to run (time period may be wrong etc), but if its determined to
be non-runnable the escheduled time for it should cause a re-write of
the status.dat file correct?
There has to be an easier way of determining if nagios is running
doesn't there?
-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list