How to debug nrpe not connecting?
david palm
dvdplm at gmail.com
Wed May 16 18:35:03 CEST 2007
Hi all,
disclaimer: I'm new here and new to Nagios, so please bear with me. I've
searched and tried all lists, faq and whatnot to manage on my own, but to no
avail.
My goal (for now) is to make a dead simple nagios install:
Server A runs nagios, does some basic localhost checks
Server B runs nrpe daemon
Server A runs a check_load command on Server B thorough nrpe
I have done two separate installations from source on two different sets of
servers, running on different networks and SuSE 10.1/9.2 in one case, Debian
Unstable/Gentoo in the other. The error is identical on both sets of
servers, so I have concluded it is not a OS related problem.
Yes, you probably guessed it, it is the oh so common "Warning: Return code
of 127 for check of service 'CPU Load' on host 'gerald' was out of bounds.
Make sure the plugin you're trying to run actually exists.".
Ok, so here are some details ("helena" is Server A, where nagios runs;
"gerald" is Server B, where the nrpe daemon runs). All commands are run as
the "nagios" user.
nagios at helena:/$ /opt/nagios/libexec/check_nrpe -H localhost
returns "NRPE v2.8.1" just as it should
nagios at helena:/$ /opt/nagios/libexec/check_nrpe -H gerald -c check_load
returns a correct reply and I can see the two servers speaking by doing a
tail -f /var/log/syslog on Server B:
May 16 18:15:42 gerald nrpe[20766]: Connection from 192.168.1.10 port
29830
May 16 18:15:42 gerald nrpe[20766]: Host address is in allowed_hosts
May 16 18:15:42 gerald nrpe[20766]: Handling the connection...
May 16 18:15:42 gerald nrpe[20766]: Host is asking for command
'check_load' to be run...
May 16 18:15:42 gerald nrpe[20766]: Running command:
/opt/nagios//libexec/check_load -w 15,10,5 -c 30,25,20
May 16 18:15:42 gerald nrpe[20766]: Command completed with return code 0
and output: OK - load average: 0.01, 0.01, 0.00|load1=0.010;15.000;30.000;0;
load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0;
May 16 18:15:42 gerald nrpe[20766]: Return Code: 0, Output: OK - load
average: 0.01, 0.01, 0.00|load1=0.010;15.000;30.000;0;
load5=0.010;10.000;25.000;0;
load15=0.000;5.000;20.000;0;
May 16 18:15:42 gerald nrpe[20766]: Connection from 192.168.1.10 closed.
So, am I correct in assuming that nrpe is correctly running and functioning
on both servers? In this case the nrpe daemon is running as a stand-alone
daemon, but results are exactly the same when running under xinetd on the
SuSE servers (first set of servers).
When I launch nagios as a foreground process I see something interesting on
the console:
root at helena:~/custom_compiles/nrpe-2.8.1# /opt/nagios/bin/nagios
/opt/nagios/etc/nagios.cfg
Nagios 2.9
Copyright (c) 1999-2007 Ethan Galstad (http://www.nagios.org)
Last Modified: 04-10-2007
License: GPL
Nagios 2.9 starting... (PID=5406)
!! sh: line 1: /opt/nagios/libexecHOSTADDRESS$: No such file or directory
Warning: Return code of 127 for check of service 'CPU Load' on host
'gerald' was out of bounds. Make sure the plugin you're trying to run
actually exists.
!! sh: line 1: /opt/nagios/libexecHOSTADDRESS$: No such file or directory
Warning: Return code of 127 for check of service 'CPU Load' on host
'gerald' was out of bounds. Make sure the plugin you're trying to run
actually exists.
See those errors from sh? The command line seems to have been stripped of
the actual command and the $HOSTADDRESS$ macro lacks the leading "$"...
Now, who/what is doing this to the command? I've checked and double checked
my config files and believe they're correct. The relevant bits follow:
commands.cfg:
define command{
command_name check_nrpe
command_line $USER1/check_nrpe -H $HOSTADDRESS$ -c $ARGS1$
}
nrpe.cfg (on Server B, "gerald"):
command[check_load]=/opt/nagios/libexec/check_load -w 15,10,5 -c
30,25,20
resource.cfg:
$USER1$=/opt/nagios/libexec
gerald.cfg:
define host{
use linux-server
host_name gerald
alias gerald
address 192.168.1.200
}
define service{
use generic-service ; Name of
service template to use
host_name gerald
service_description CPU Load
check_period 24x7 ; The service
can be checked at any time of the day
max_check_attempts 600 ; Re-check
the service up to 4 times in order to determine its final (hard) state
normal_check_interval 2 ; Check the
service every 5 minutes under normal conditions
retry_check_interval 1 ; Re-check the
service every minute until a hard state can be determined
contact_groups admins ; Notifications
get sent out to everyone in the 'admins' group
notification_options w,u,c,r ; Send
notifications about warning, unknown, critical, and recovery events
notification_interval 360 ; Re-notify
about service problems every hour
notification_period 24x7 ;
Notifications can be sent out at any time
check_command check_nrpe!check_load
}
As you can see not much have been changed from the basic installation
instructions.
Ideas anyone? :-(
How can I configure nagios to provide more leads (debug info) than that
miserable sh error above? Is the debug_file (and related) options a
3.0-onlyoption? (
http://nagios.sourceforge.net/docs/3_0/configmain.html#debug_file)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070516/6d8645f2/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list