(No output!) Errors in Nagios 2.4
Andy Shellam
andy.shellam-lists at mailnetwork.co.uk
Sun Aug 27 16:48:37 CEST 2006
Hugo,
I didn't think it relevant to post full details of hosts/services as
sometimes the commands work and sometimes they don't.
It's not a problem with command syntax, specific host or service - it's
a global thing. If I run the command manually they work fine, as shown
below.
Here's an example of a failing PostgreSQL service: "(Return code of 127
is out of bounds - plugin may be missing)"
Run it manually:
> su -c '/usr/local/nagios/libexec/check_pgsql -H <HOST_IP> -P 5432 -d
> <DB_NAME> -l <LOGIN_USER> -w 30 -c 60' - nagios
> OK - database <DB_NAME> (0
> sec.)|time=0.000000s;30.000000;60.000000;0.000000
If I force the service to re-poll for an active check then that error
will clear and come up OK, but then another service will fail.
Currently I've got 3 failures on services that are actually up and working.
Take another example - the SSH service on the Nagios machine - currently
reading "CRITICAL - Server answer:". The flapping state is "Percent
State Change:72.70%" which suggests the service is coming up and down
extremely randomly, however the machine and SSH service is working fine.
The command for this is:
define command {
command_name Check_SSH
command_line /usr/local/nagios/libexec/check_ssh -H $ARG1$ -p
3322
}
And the service definition:
define service {
host_name Perth,Sydney-1
use Service_Template
service_description Encrypted Remote Access - SSH
check_command Check_SSH!$HOSTADDRESS$
}
And the same for running a HTTP service which is reading "(No status!)"
manually:
> [root at dns zones]# su -c '/usr/local/nagios/libexec/check_http -H
> www.andyshellam.eu -N -p 80 -A "Nagios/2.4/dns.mailnetwork.co.uk" -f
> follow -w 30 -c 60 -t 120' - nagios
> HTTP OK HTTP/1.1 200 OK - 1023 bytes in 0.006 seconds
> |time=0.006282s;30.000000;60.000000;0.000000 size=1023B;;;0
Andy.
Hugo van der Kooij wrote:
> On Sun, 27 Aug 2006, Andy Shellam wrote:
>
>
>> I've been using Nagios for around 5 months now with no problems. I've
>> recently added a new server onto my network, which has added somewhere
>> in the region of another 3 hosts and 12 services onto Nagios.
>>
>> Since then I now keep getting random errors in the "Status Information"
>> for services only.
>>
>> For example I've got a HTTP monitor which monitors
>> http://photos.andyshellam.eu:80, and this has started saying "Name or
>> service not known" or "(No output!)" and labelled with either an OK or
>> CRITICAL state (when the site is actually OK.)
>>
>
> I think you could improve the likelyhood of getting help by providing:
> - host definition (+ template if needed)
> - service definition (+ template if needed)
> - checkcommand definition
> - Results of check command as user nagios from the commandline
>
> Hugo.
>
>
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list