Warning - running nagios did not exit in time
Christian Roy
croy+nagios at infiniweb.ca
Fri Sep 15 16:45:17 CEST 2006
Hello,
Server where nagios is located:
Red Hat Linux release 7.2 (Enigma)
Apache .2.2.3
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98 )
Server that I want to monitor (nrpe installed):
Fedora Core release 2 (Tettnang)
Apache .2.2.3
version gcc 3.3.3 20040412 (Red Hat Linux 3.3.3-7)
I have been using nagios v1.x for some time now on a different
server (and network).
I am doing a new installation of the 2.5 release.
I am starting fresh, I am not trying to import anything.
I installed an RPM version by rebuilding the src rpm.
When I start nagios (/etc/init.d/nagios start) I get no errors:
Starting network monitor: nagios
In the nagios.log I see no errors:
[1158259211] Nagios 2.5 starting... (PID=23923)
[1158259211] LOG VERSION: 2.0
[1158259211] Finished daemonizing... (New PID=25591)
The file /var/log/messages has similar lines.
nagios -s reports:
Nagios 2.5
Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org)
Last Modified: 07-13-2006
License: GPL
Projected scheduling information for host and service
checks is listed below. This information assumes that
you are going to start running Nagios with your current
config files.
HOST SCHEDULING INFORMATION
---------------------------
Total hosts: 1
Total scheduled hosts: 0
Host inter-check delay method: SMART
Average host check interval: 0.00 sec
Host inter-check delay: 0.00 sec
Max host check spread: 30 min
First scheduled check: N/A
Last scheduled check: N/A
SERVICE SCHEDULING INFORMATION
-------------------------------
Total services: 6
Total scheduled services: 6
Service inter-check delay method: SMART
Average service check interval: 300.00 sec
Inter-check delay: 50.00 sec
Interleave factor method: SMART
Average services per host: 6.00
Service interleave factor: 6
Max service check spread: 30 min
First scheduled check: Fri Sep 15 16:30:51 2006
Last scheduled check: Fri Sep 15 16:35:01 2006
CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval: 10 sec
Max concurrent service checks: Unlimited
PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.
I used the minimal.cfg file, renamed it and simply changed
the definitions for the services, hosts, contact, etc.. I
also changed the chekc_commands to use the check_nrpe like
this :
define command{
command_name check_nrpe
command_line .../check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
define service {
...
check_command check_nrpe!check_disk1
}
I have the nrpe deaemon running on the remote host with
the IP of where nagios is in the allowed_host config
(running as xinet.d) and I have verified it is running
by telnet localhost 5666 and getting a connection
from both servers (nagios server and the server I want
to monitor).
At first, I had check external commands disabled. I
could use the web interface but all services were PENDING
and staying PENDING even after the time went past the
next scheduled check.
And when I was trying to stop nagios (/etc/init.d/nagios stop)
I would get an error (the only error I saw in all this mess):
Stopping network monitor: nagios
Waiting for nagios to exit . . . . . . . . . . .
Warning - running nagios did not exit in time
Some left over nagios process is left running and I have
to "killall -9 nagios" to stop nagios otherwise I end up
having multiple copies running.
I've enabled check extern commands and changed the "rw"
directory's permission based on the documentation
(http://nagios.sourceforge.net/docs/2_0/commandfile.html)
and now the web says nagios is not running even though I
restarted apache and nagios.
Error I get is :
"Whoops! Error: Could not read host and service status
information!"
However nagios IS running.
(ps awux | grep nagios shows 3 processes)
apache is running as "daemon" so I have added secondary group
nagiocmd to nagios and to daemon.
the permission of the directory is:
drwxrws--- nagios nagiocmd /var/log/nagios/rw/
When I run nagios -v /etc/nagios/nagios.cfg I get no
warning and no errors (Things look okay).
I have 6 services, one host, one host group, one
contact, one contact group, 9 commands and one
timeperiod.
I've disabled the retention as suggested in this
forum for services and hosts left in "PENDING", but
no use. I would very much like to force a
check using the CGI but I am having trouble there too.
The log directory does not contains a status.log file,
I do not know if that's relevant.
I tried starting the daemon by hand
(nagios -d /etc/nagios/nagios.cfg) but
I got no error messages there either.
I've searched on google to the only error message
(in the title of this post) but only found reference
to the script.
I went over the parts of the documentation that seem
related to my problem, I've looked thru the nagios
FAQ and the forum.
So in summary, nagios cannot be stopped properly,
the services are never checked, maybe a log file
is missing, there's not really any usefull error
messages and I am lossing patience!
I think I have done everything I can possibly can
but I must be missing something that is too obvious
and too simple for me to see, or something related
to some information I am not aware of.
Any pointers would be appreciated.
Thank you
Christian Roy
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list