Eternally pending, stale checks
Mike Lindsey
mike-nagios at 5dninja.net
Fri Aug 5 00:53:19 CEST 2011
I deployed new monitoring today, and despite a few restarts and many
hours of waiting, 185/220 services are still pending.
It's a 3.2.1 environment (yes, yes, upgrade, yes) with one master and
multiple pollers. All this new monitoring is on one polling host.
Active checks are disabled on the master, passive checks are submitted
via NSCA. Freshness threshold is set to 20 minutes for checks with a 5
minute interval.
The polling host executes the checks, has the right data in the
status.log, but the master never receives some of the check data.
The data it does receive is not consistently grouped. Service A on one
host will submit consistently, but the same service on a different host
will fail to submit. The master will, every 20 minutes throw messages
about the checks being stale, and needing to force an immediate check,
but that never seems to make it's way through.
My next step, I suppose will be enabling debug mode on the master, but
if history is any indication, that will cause the problem to stop
happening - in addition to it being a pain to parse through debug logs
for a 10k service environment. If anyone has ideas on what else to
check, I'm ears.
--
Mike Lindsey
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts.
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list