Passive Service Checks and 127
Chris Ditri
chrisd at better-investing.org
Mon Feb 3 15:43:25 CET 2003
Hello everyone.
This one is weird. I am monitoring a cluster of co-located servers from our
place of work. We have to go through two firewalls to do this, so I decided
to use passive service checks.
Most of the time, Nagios is working great -- but every once in a while, (about
once or twice a day) we get an error stating this: (Return code of 127 is out
of bounds - plugin may be missing).
I have seven machines I am monitoring with passive service checks, with an
average of about eight processes. The problem does not appear to be
plugin-related, as this error does not seem to favor any one machine or
service in the cluster. I have each machine send out results on these checks
every 5 minutes or so (each one staggered, to help ensure they don't all come
in at once). I have freshness_check_interval set to 665 -- a litte over 11
minutes.
Now, if checks come in every 5 minutes, and checks do not go stale for 11
minutes, theoretically I should never get this error (unless there is
actually a problem).
At first, I thought it might be because I was using a beta version of nagios,
but when 1.0 came out, I updated it and the problem still occurs. Then I
thought it might be the hub the nagios server was attached to, so we bought a
new switch. This, of course, did not help either. I decided to log the
output from nsca and it says that each check was sent successfully.
This makes no sense to me. Prior to switching to Nagios, I used Netsaint with
npre and nsca to acheive the same results and never had these problems -- and
I had set my freshness checking interval to an even tighter window -- 7
minutes.
Can anyone tell me what is going on here?
Thank you!
Chris
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
More information about the Users
mailing list