"UNKNOWN" problem, with a return code of 0 in solaris 9.
Patrick Walentiny
Patrick.Walentiny at stellent.com
Thu Dec 11 17:37:31 CET 2003
I looked through a lot of the mail list archives and didn't see this
problem listed by anyone else. It's a bit long winded but I figure
being verbose is better than not being verbose enough. So here goes.
We are using nagios to monitor multiple UNIX systems,
including Solaris 2.6 8 & 9. Our method for doing so is public keys
saved in the "authorized_keys" file of each nagios client's home
directory etc... When our server runs the "check_by_ssh" against
solaris 9 clients, the nagios server claims it is getting a status of
"UNKNOWN". I dug in to the documentation to find out that this is
figured out via return codes, IE 0,1,2,3 etc... So I ran check_by_ssh
against the system in question to see for myself what the problem was.
I am only able to get return codes of 0 "OK". I'll show the syntax I
used below. I can get it to also give return codes of 1 and 2 if I
intentionally invoke a warning or critical condition, but when it is run
from the nagios process itself it shows up as "UNKNOWN", and
occasionally flaps to okay for brief periods of time, even though the
status output shows the disks are perfectly okay.
Here is my output from my tests....
[...]
$ /usr/lib/nagios/plugins/check_by_ssh -H 12.40.185.175 -C
'/opt/nagios/libexec/check_disk -c 10% -w 20%'
$ echo $?
0
$
[...]
I even had this running in a continual loop to see if maybe 1 out of 10
would go in to an UNKNOWN state, but it doesn't. I will paste the
portions of my config that should matter for this output. I really
appreciate any help you guys can give me.
/*
* Command Definition
********************************/
# 'check_remote_disk' command definaition
define command {
command_name check_remote_disk
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C
'/opt/nagios/libexec/check_disk -c 10% -w 20%'
}
/*
* Host Definition
********************************/
define host {
use generic-host
host_name gondor
alias Minneapolis Production Webserver
(gondor)
address 12.40.185.175
check_command check-host-alive
max_check_attempts 10
notification_interval 10
notification_period 24x7
notification_options d,u,r
parents mspfw1
}
Thanks again for any help, if you need anymore output that this let me
know.
Patrick.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20031211/0a0eef80/attachment.html>
More information about the Users
mailing list