Weirdness with remote (passive) checks. Critical on remote, OK on local?
Brian Smith
bsmith at fusionbroadband.com
Tue Aug 2 21:30:05 CEST 2005
Thanks Marc for the tips, but this has just gotten weirder - neither
submit_check_result, nor submit_check_result_via_nsca, seem to ever run.
NSCA is being invoked, I see its process pop up when checks happen.
Checks are being delivered to home base, because manual Critical states
get overridden after a few minutes. Also, I can invoke this command and
deliver a single distributed check successfully to home base:
(folder)/submit_check_result_via_nsca remotehost 'Telnet' 2 'Because I
said so'
That command successfully sends the service into a soft critical state
on the home server, and running it multiple times sends it to hard
critical.
I've tacked little "debug" lines into the submit_check_result and
submit_..._via_nsca scripts to echo their commands into a log file, and
the log file never gets appended. So I put commands in to echo the word
'test' into the logfile, and that word never gets put in there either.
>From the end of checkcommands.cfg:
# 'submit_check_result' command definition
define command{
command_name submit_check_result
command_line $USER1$/eventhandlers/distributed-
monitoring/submit_check_result_via_nsca
$HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$OUTPUT$'
}
(except without the line breaks I inserted to make it behave in the
email.)
It appears I will have to trace, from the check queue to NSCA, how this
is being executed. Can anyone tell me where in the config files the
following things could be set:
location of a custom script, if it's not set in the lines from
checkcommands.cfg above?
What return codes are used for OK, Critical, Warning, etc? So far it
appears Nagios is sending a 0 for all cases. If not Nagios, whatever is
invoking NSCA is sending it, or whatever is invoking the script that
invokes NSCA is. I can't figure out what the chain of commands is here,
but I know that home base Nagios is working correctly and NSCA is
sending / receiving correctly, and remote Nagios is writing Critical in
the status logs.
And, by the way, am I correct in assuming people on the mailing list
prefer text-only emails? Otherwise I will send as html.
Thanks again,
-- Brian
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Marc Powell
> Sent: Monday, August 01, 2005 4:46 PM
> To: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Weirdness with remote (passive) checks.
> Critical on remote, OK on local?
>
>
>
> > -----Original Message-----
> > From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> > admin at lists.sourceforge.net] On Behalf Of Brian Smith
> > Sent: Monday, August 01, 2005 4:28 PM
> > To: nagios-users at lists.sourceforge.net
> > Subject: [Nagios-users] Weirdness with remote (passive) checks.
> Critical
> > on remote, OK on local?
> >
> > Hello again guys, and thanks for the previous useful replies to
> > other questions.
> >
> >
> >
> > Weird problem going on here, will provide as much detail as I can.
> >
> >
> >
> > We have some hosts on private IPs being monitored passively through
> NSCA
> > using remote servers running Nagios. It's basically your textbook
> passive
> > monitoring system.
> >
> >
> >
> > Currently every switch being monitored this way that is (in real
> > life) down or unreachable, is showing as "Status: OK, Status
Information:
> > Connection refused or timed out."
>
> [Aggressive snip]
>
> >
> > Submitting a manual Critical check result puts the host properly
> > into Critical, but it pops back to OK in a few minutes when a
> > passive check comes in. (so passive checks are coming in and are
> > setting the
> state.)
> >
> >
> >
> > In the Nagios web interface it shows the hosts as a nice green OK,
> with
> > details "connection refused or timed out."
>
> It looks like your submit_check_result script isn't sending the proper
> return code. If you look at the example script at
> http://nagios.sourceforge.net/docs/1_0/distributed.html and the
> arguments passed to it in the command definition, does yours set it
> properly? That return code is how nagios determines what state a
> service is in, not the human readable text or plugin output. It will
> correspond to the 3rd field passed to send_nsca.
>
> --
> Marc
>
>
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
> from IBM. Find simple to follow Roadmaps, straightforward articles,
> informative Webcasts and more! Get everything you need to get up to
> speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=ick
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list