Active host check scheduling in a distributed environment
Marc Powell
marc at ena.com
Tue Jul 14 17:30:59 CEST 2009
On Jul 14, 2009, at 9:46 AM, Paul Corcoran wrote:
> HI,
>
> I run a distributed Nagios environment consisting of 1 parent server
> and 2 child servers.
>
> The child servers perform all the service checking while the parent
> server should be performing active service checks.
Both the child server and the central server are performing active
service checks?
> The host definitions are configured to perform host checks every 5
> minutes. The retry interval is 1 minute and the max attempts is set
> to 5.
On both or are you submitting passive host checks or are you expecting
the central machine to initiate it's own active checks of hosts?
> We are monitoring 580 hosts and approx 4000 services.
>
> I noticed when a host down was detected the parent server did not
> perform any retries of the host. This led to the status of the host
> being stuck in a SOFT state and therefore no alerts were sent out as
> required. I noticed that the child server performed the host checks
> without any problem and the host was logged as being in a HARD down
> state after 5 failed attempts.
I'm not sure what configuration you could have that would lead to
this. Can you post the host{} definition and any relevant log entries?
Are you only sending a single passive host result and have
'passive_host_checks_are_soft' set in nagios.cfg?
> Is there a specific variable in nagios.cfg that explicitly tells the
> server to perform active checks?
There are a few --
- in nagios.cfg - execute_host_checks=<0/1>
- in your host definition - active_checks_enabled [0/1], an
appropriate check_period, check_interval and retry_interval set and an
appropriate check_command set.
> Is it best practice to have the 2 child servers perform passive host
> checks?
I have no opinion on this other that to say that if you trust the
remote nagios' to correctly report on services, they can usually be
trusted to correctly report on hosts.
> Is it possible that processing all the passive service check info is
> causing the parent server to lag behind in it's own process queue?
Not likely, IMHO, assuming you're using somewhat modern hardware. You
can see for sure under Performance Info though. Look for high
latencies (minutes)... This is a measure of how long after a check was
scheduled to run that it actually it ran.
--
Marc
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list