distributed host checks: freshness checking issues
Andreas Ericsson
ae at op5.se
Tue Jun 7 10:12:25 CEST 2011
On 06/01/2011 05:51 PM, Pascal Vandeputte wrote:
> Okay, I got some new information: when I look at the Scheduling Queue, I
> see that the master is still scheduling active checks for some reason.
>
> In nagios.cfg I specified "execute_host_checks=0" and each host stanza has
> "check_interval 0" which should prevent any scheduled active host checking,
> right?
>
> It's also weird that the queue is falling behind fast... Half a day after
> the last nagios reload, the "next check" of a host is scheduled several
> hours earlier than the time shown in the "last check" column :s
> In the attached screenshot, it's just a couple of minutes behind but Nagios
> was reloaded 10 minutes earlier.
>
>
> Things I tried in the mean time in nagios.cfg:
>
> use_retained_program_state=0
> use_retained_scheduling_info=0
> (in case something from the state file was keeping Nagios from using the
> new settings)
>
> check_result_reaper_frequency=2
> (this last change was suggested when running "nagios -s nagios.cfg")
>
> But nothing seems to fix this.
>
>
> Now, after reading
> http://nagios.sourceforge.net/docs/3_0/hostchecks.html
> one more time, I'm beginning to fear that it's impossible to make the
> master only run the check_command when doing freshness checks:
>
> "If you set the check_interval option in your host definition to zero (0),
> Nagios will not perform checks of the hosts on a regular basis. It will,
> however, still perform on-demand checks of the host as needed for other
> parts of the monitoring logic."
>
> Those other parts of the monitoring logic are e.g. the "host reachability
> logic" and some more things. If those still cause on-demand checks, which
> only result in a "stale" warning, then it looks quite bad for anyone trying
> to monitor hosts in remote private networks.
>
>
> And then I started fiddling with host check caching for on-demand host
> checks. http://nagios.sourceforge.net/docs/3_0/cachedchecks.html
>
> After increasing cached_host_check_horizon to 300 seconds (the biggest host
> check_interval we use), all of these on-demand checks should get their data
> from the last cached check.
>
> And indeed, no more wild mood swings in the host states! Yay! After a quick
> test it seems that my problem is now solved. Touch wood.
>
> I'd rather just have an option to *really* disable host checks altogether,
> after all that's what you think you're doing with "execute_host_checks=0",
> according to the documentation at
> http://nagios.sourceforge.net/docs/3_0/configmain.html
>
> On the other hand, letting nagios look at old results in the cache is
> probably not that different from doing no checks at all. I'm only worried
> that the caching may delay notifications in some cases, but we'll have to
> experience this I guess.
>
>
> Can anyone confirm that my reasoning is correct? That the master will
> *always* keep on doing *some* host checks no matter what you configure?
>
More or less, yes. It will at least schedule them even if it gets results
for them, but eventbroker modules can block even forced host checks. I'd
look into using Merlin, DNX or mod_gearman if I were you. It will do what
you want with far better performance than NSCA will ever be able to.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list