distributed host checks: freshness checking issues
Pascal Vandeputte
nagios at asmodeus.be
Wed Jun 1 17:51:10 CEST 2011
Okay, I got some new information: when I look at the Scheduling Queue, I
see that the master is still scheduling active checks for some reason.
In nagios.cfg I specified "execute_host_checks=0" and each host stanza has
"check_interval 0" which should prevent any scheduled active host checking,
right?
It's also weird that the queue is falling behind fast... Half a day after
the last nagios reload, the "next check" of a host is scheduled several
hours earlier than the time shown in the "last check" column :s
In the attached screenshot, it's just a couple of minutes behind but Nagios
was reloaded 10 minutes earlier.
Things I tried in the mean time in nagios.cfg:
use_retained_program_state=0
use_retained_scheduling_info=0
(in case something from the state file was keeping Nagios from using the
new settings)
check_result_reaper_frequency=2
(this last change was suggested when running "nagios -s nagios.cfg")
But nothing seems to fix this.
Now, after reading
http://nagios.sourceforge.net/docs/3_0/hostchecks.html
one more time, I'm beginning to fear that it's impossible to make the
master only run the check_command when doing freshness checks:
"If you set the check_interval option in your host definition to zero (0),
Nagios will not perform checks of the hosts on a regular basis. It will,
however, still perform on-demand checks of the host as needed for other
parts of the monitoring logic."
Those other parts of the monitoring logic are e.g. the "host reachability
logic" and some more things. If those still cause on-demand checks, which
only result in a "stale" warning, then it looks quite bad for anyone trying
to monitor hosts in remote private networks.
And then I started fiddling with host check caching for on-demand host
checks. http://nagios.sourceforge.net/docs/3_0/cachedchecks.html
After increasing cached_host_check_horizon to 300 seconds (the biggest host
check_interval we use), all of these on-demand checks should get their data
from the last cached check.
And indeed, no more wild mood swings in the host states! Yay! After a quick
test it seems that my problem is now solved. Touch wood.
I'd rather just have an option to *really* disable host checks altogether,
after all that's what you think you're doing with "execute_host_checks=0",
according to the documentation at
http://nagios.sourceforge.net/docs/3_0/configmain.html
On the other hand, letting nagios look at old results in the cache is
probably not that different from doing no checks at all. I'm only worried
that the caching may delay notifications in some cases, but we'll have to
experience this I guess.
Can anyone confirm that my reasoning is correct? That the master will
*always* keep on doing *some* host checks no matter what you configure?
Best regards,
Pascal
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios.png
Type: image/png
Size: 89969 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20110601/d7f7103b/attachment.png>
-------------- next part --------------
------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Data protection magic?
Nope - It's vRanger. Get your free trial download today.
http://p.sf.net/sfu/quest-sfdev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list