Questions on migrating to Distributed Environment on Nag 1.1

James Harrison james.harrison at amcg.com
Mon Oct 13 18:08:43 CEST 2003

Previous message: NRPE 2.0 with IRIX 6.5.19m
Next message: Questions on migrating to Distributed Environment on Nag 1.1
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

List,

I have recently outgrown my central server only configuration and have
begun migrating over to a central poller polling some sites and a
distributed poller taking up the slack.

I have successully setup NSCA and have approximately 20 sites being
polled (simple PING using check_ping service) via the distributed poller
with results being sent to the central box.  Everything seems to be
working properly with just a few questions created along the way.

I'm 95% of the way there!

1.  Where do I do check_host_alive? Central or Distributed?

In theory, if I want to do all my polling(host alive and service checks)
from the distributed box then can/how do I do check_host_alive plus
service polling from the distribute box.  Or maybe the better question
is how/can I setup the central box's check_host_alive(or equivalent
command[check_dummy, etc]) so that it is "ignored" and is completed via
a "passive" check?  I am using the "check period set to none" method for
setting up my passive services on the central server.  I cannot find or
I'm just missing a similar option for host information.

Or, as I suspect, but can't verify through the docs, check_host_alive is
an active process that must always be done from the central server and
additional service checks for that host can/are performed via the
distributed box.

2.  Out of bounds (after a stale detect)

On my passive checks that I'm getting from my distributed box I'm
intermittently receiving "Warning: Return code of 127 for check of
service" errors in my event log.  This appears to be occurring after a
stale detect when the central server says "I'm forcing an immediate
check of this service"  These errors are never creating a "HARD" down
state and therefore no notifications are being sent.  Is this cause for
concern?  I obviously have Freshness Checking turned on for the passive
checks as recommended.

3.  Why am I getting stale detects as mentioned in question 2?

4.  For members of the list who are managing large Nagios
implementations "thousands of services on hundreds of hosts", how are
your configs managed?  I'm currently using Nagmin, which seems ok, but
was just curious what big sites might be doing.

Thanks
-- 
James Harrison RHCE, CCNA


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Previous message: NRPE 2.0 with IRIX 6.5.19m
Next message: Questions on migrating to Distributed Environment on Nag 1.1
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Users mailing list