Verion 2.0b4 how does cgi's nagios_check_command work?
Andreas Ericsson
ae at op5.se
Thu Oct 13 16:54:36 CEST 2005
John P. Rouillard wrote:
> In message <434D5AF2.7010504 at op5.se>,
> Andreas Ericsson writes:
>
>
>>John P. Rouillard wrote:
>>
>>>In message <43467920.4070508 at op5.se>,
>>>Andreas Ericsson writes:
>>>
>>>>John P. Rouillard wrote:
>>>>
>>>>
>>>>>In message <43465AB9.6020304 at op5.se>,
>>>>>Andreas Ericsson writes:
>>>>>
>>>>>
>>>>>>John P. Rouillard wrote:
>>>
>>>
>>>>>>>The reason I ask is
>>>>>>>that nagios was down and the cgi's all happily reported that it was
>>>>>>>up. Could this be because the host and service status files were
>>>>>>>available since the machine crashed?
>>>>>>
>>>>>>Yes, that's almost certainly it. There is no really good way of
>>>>>>detecting that nagios is actually running unless you're logged in as
>>>>>>root.
>>>>>
>>>>>Hmm, I am not sure I follow why you need to be logged in as root.
>>>>
>>>>Because otherwise you shouldn't have access to reading process
>>>>information about another users process.
>>>>
>>>>
>>>>>Why not stat the status.log file and check to see if its (mtime)
>>>>>timestamp is less than the setting of:
>>>>>
>>>>> status_update_interval*2
>>>>>
>>>>>if aggregate_status_updates is enabled? One could also allow a setting
>>>>>"freshness_threshold" in cgi.cfg that is the number of seconds/minutes
>>>>>old the status.dat file is allowed to be if aggregate_status_updates
>>>>>isn't set.
>>>>
>>>>Good idea. Write the code for it and submit a patch.
>>>
>>>Actually not so much a good idea. There is actully a creation
>>>datestamp in the status.dat file I was going to use, but I decided to
>>>run an experiment first. I have my status_update_interval set to 3
>>>seconds.
>>>
>>>I used check_fileage to warn me if the file's age was over 3 seconds
>>>and ran it in a while loop. It failed often. The longest interval was
>>>139 seconds between updates with a number of periods of 20-30 seconds.
>>>
>>>My guesses are: nagios only writes the status file when it needs to.
>>
>>This is correct. The status_update_interval is never checked, although
>>the status is updated every time a service changes either state or
>>output (or a host, for that matter).
>
>
> Ideally nagios would provide a next_check_time in the status.dat, but
> I wonder if that could be usefully intuited from:
>
> min(
> min(next_check time on services) + service_check_timeout),
> min(next_check time on hosts) + host_check_timeout)
> )
>
> Possible problems: on demand host checks (if part of a network is
> down) could screw up the timing since everything else stops.
>
> Just because a service check is scheduled doesn't mean that it is
> going to run (time period may be wrong etc), but if its determined to
> be non-runnable the escheduled time for it should cause a re-write of
> the status.dat file correct?
>
> There has to be an easier way of determining if nagios is running
> doesn't there?
>
Easy isn't the problem. The trick is to get it to work from a different
and almost always less privileged user. Perhaps a simple neb-module can
touch some file every 10 seconds and if it's 30 seconds old the GUI
could then reasonably suspect that nagios has crashed.
However, I haven't noticed nagios crashing on a modern system. It used
to, with glibc-2.0.35 and linuxthreads-0.7 (which was really buggy).
Since upgrading to glibc-2.3.30 (or some such) and linuxthreads-0.10
everything is running smoothly, so this isn't really a problem for me or
any of our customers.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list