Distributed monitoring: central collector doesn't seem to be able to run active checks
Justin Pryzby
justinp at norchemlab.com
Wed Aug 28 13:54:03 CEST 2013
Do you get many of those error messages in the logs at once, or just
one at a time?
Only one thought: what are the permissions on your $USER$ variables?
Nagios on my systems setuid() to nonroot after startup, and if it gets
SIGHUP to reload config, but can't read the file defining $USER*$,
will act strangely.
Justin
On Wed, Aug 28, 2013 at 06:48:09AM -0500, C. Bensend wrote:
>
> > I'm continuing to iron out the wrinkles with 3.5.1 and distributed
> > monitoring. I'm using mod_gearman to submit and receive events from
> > two distributed pollers.
> >
> > Every now and again, I'll get something similar in the log on the
> > centralized collecting machine:
> >
> > CRITICAL: Return code of 127 is out of bounds. Make sure the plugin
> > youre trying to run actually exists. (worker: collector.domain.org)
> >
> > To me, that suggests that the collector system didn't get a result
> > for a host or service in a timely manner from one of the polling
> > systems, and so it attempted to run an active check itself. However,
> > it doesn't seem to be able to, and I don't know why.
> >
> > The collector has the same value for $USER1$, and it has the same
> > set of plugins installed on it:
> >
> > On the collector:
> >
> > grep USER1 etc/resource.cfg
> > $USER1$=/usr/local/nagios/libexec
> >
> > On the two pollers:
> >
> > $USER1$=/usr/local/nagios/libexec
> > $USER1$=/usr/local/nagios/libexec
> >
> > The plugins are installed in identical locations on all three systems,
> > that's enforced via Puppet. The 'nagios' user can find and run them on
> > the collector:
> >
> > /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
> > NRPE v2.13
> >
> > Now, because this is a distributed setup, the collector system is
> > not configured to run active checks:
> >
> > grep ^execute etc/nagios.cfg
> > execute_service_checks=0
> > execute_host_checks=0
> >
> > ... but *obviously* it's trying to. Is it failing because it's
> > configured to not run them? If that's the case, the error message is
> > not accurate and should be corrected. If that's *not* the case, why
> > can't my collector server run an active check when it believes it needs
> > to?
> >
> > I use NConf to generate my configurations, if that matters. There are
> > a *lot* of hosts/services and quite a few configuration files, so I'm not
> > going to paste a slew of information here. If I'm missing pertinent
> > information, please let me know exactly what you want to see and I'll
> > get it.
>
> No one has an idea about this? And no, Andreas, I can't move to
> 4.0 yet. ;)
>
> Thanks!
>
> Benny
>
>
> --
> "No matter how tempted I am with the prospect of unlimited power, I
> will not consume any energy field bigger than my head."
> -- #22 on Peter Anspach's Evil
> Overlord list
>
>
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list