Hi all,<br>disclaimer: I'm new here and new to Nagios, so please bear with me. I've searched and tried all lists, faq and whatnot to manage on my own, but to no avail.<br><br>My goal (for now) is to make a dead simple nagios install:
<br>Server A runs nagios, does some basic localhost checks<br>Server B runs nrpe daemon<br>Server A runs a check_load command on Server B thorough nrpe<br><br>I have done two separate installations from source on two different sets of servers, running on different networks and SuSE
10.1/9.2 in one case, Debian Unstable/Gentoo in the other. The error is identical on both sets of servers, so I have concluded it is not a OS related problem.<br><br>Yes, you probably guessed it, it is the oh so common "Warning: Return code of 127 for check of service 'CPU Load' on host 'gerald' was out of bounds. Make sure the plugin you're trying to run actually exists.".
<br><br>Ok, so here are some details ("helena" is Server A, where nagios runs; "gerald" is Server B, where the nrpe daemon runs). All commands are run as the "nagios" user.<br><br> <span style="font-family: courier new,monospace;">
nagios@helena:/$ /opt/nagios/libexec/check_nrpe -H localhost </span><br><br>returns "NRPE v2.8.1" just as it should<br><br> <span style="font-family: courier new,monospace;">nagios@helena:/$ /opt/nagios/libexec/check_nrpe -H gerald -c check_load
</span><br><br>returns a correct reply and I can see the two servers speaking by doing a tail -f /var/log/syslog on Server B:<br><font style="font-family: courier new,monospace;" size="2"><br> May 16 18:15:42 gerald nrpe[20766]: Connection from
<a href="http://192.168.1.10">192.168.1.10</a> port 29830<br> May 16 18:15:42 gerald nrpe[20766]: Host address is in allowed_hosts<br> May 16 18:15:42 gerald nrpe[20766]: Handling the connection...<br> May 16 18:15:42 gerald nrpe[20766]: Host is asking for command 'check_load' to be run...
<br> May 16 18:15:42 gerald nrpe[20766]: Running command: /opt/nagios//libexec/check_load -w 15,10,5 -c 30,25,20<br> May 16 18:15:42 gerald nrpe[20766]: Command completed with return code 0 and output: OK - load average:
0.01, 0.01, 0.00|load1=0.010;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0; <br> May 16 18:15:42 gerald nrpe[20766]: Return Code: 0, Output: OK - load average: 0.01, 0.01, 0.00|load1=0.010;15.000;30.000;0; load5=
0.010;10.000;25.000;0; load15=0.000;5.000;20.000;0; <br> May 16 18:15:42 gerald nrpe[20766]: Connection from <a href="http://192.168.1.10">192.168.1.10</a> closed.</font><br><br>So, am I correct in assuming that nrpe is correctly running and functioning on both servers? In this case the nrpe daemon is running as a stand-alone daemon, but results are exactly the same when running under xinetd on the SuSE servers (first set of servers).
<br><br>When I launch nagios as a foreground process I see something interesting on the console:<br><span style="font-family: courier new,monospace;"> root@helena:~/custom_compiles/nrpe-2.8.1# /opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg
</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> Nagios 2.9</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> Copyright (c) 1999-2007 Ethan Galstad (<a href="http://www.nagios.org">http://www.nagios.org</a>)</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
Last Modified: 04-10-2007</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> License: GPL</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> Nagios 2.9 starting... (PID=5406)</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">!! sh: line 1: /opt/nagios/libexecHOSTADDRESS$: No such file or directory
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> Warning: Return code of 127 for check of service 'CPU Load' on host 'gerald' was out of bounds. Make sure the plugin you're trying to run actually exists.
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">!! sh: line 1: /opt/nagios/libexecHOSTADDRESS$: No such file or directory</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> Warning: Return code of 127 for check of service 'CPU Load' on host 'gerald' was out of bounds. Make sure the plugin you're trying to run actually exists.
</span><br> <br>See those errors from sh? The command line seems to have been stripped of the actual command and the $HOSTADDRESS$ macro lacks the leading "$"...<br><br>Now, who/what is doing this to the command? I've checked and double checked my config files and believe they're correct. The relevant bits follow:
<br><br>commands.cfg:<br><span style="font-family: courier new,monospace;"> define command{</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> command_name check_nrpe
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> command_line $USER1/check_nrpe -H $HOSTADDRESS$ -c $ARGS1$</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> }</span><br><br>nrpe.cfg (on Server B, "gerald"):<br><span style="font-family: courier new,monospace;"> command[check_load]=/opt/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
</span><br> <br>resource.cfg:<br> <span style="font-family: courier new,monospace;"> $USER1$=/opt/nagios/libexec </span><br> <br>gerald.cfg:<br><span style="font-family: courier new,monospace;"> define host{</span>
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> use linux-server</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
host_name gerald</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> alias gerald</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> address <a href="http://192.168.1.200">192.168.1.200</a></span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
}</span><br style="font-family: courier new,monospace;"><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> define service{</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> use generic-service ; Name of service template to use</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
host_name gerald</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> service_description CPU Load</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> check_period 24x7 ; The service can be checked at any time of the day
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> max_check_attempts 600 ; Re-check the service up to 4 times in order to determine its final (hard) state
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> normal_check_interval 2 ; Check the service every 5 minutes under normal conditions
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> notification_interval 360 ; Re-notify about service problems every hour</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> notification_period 24x7 ; Notifications can be sent out at any time </span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> </span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> check_command check_nrpe!check_load</span>
<br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> }</span><br> <br>As you can see not much have been changed from the basic installation instructions.
<br><br>Ideas anyone? :-(<br><br>How can I configure nagios to provide more leads (debug info) than that miserable sh error above? Is the debug_file (and related) options a 3.0-only option? (<a href="http://nagios.sourceforge.net/docs/3_0/configmain.html#debug_file">
http://nagios.sourceforge.net/docs/3_0/configmain.html#debug_file</a>)<br>