Warning - running nagios did not exit in time

Christian Roy croy+nagios at infiniweb.ca
Fri Sep 15 16:45:17 CEST 2006
Previous message: notification_interval (per user)
Next message: 2.5 Host Check not working after upgrade
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello,

Server where nagios is located:
    Red Hat Linux release 7.2 (Enigma)
    Apache .2.2.3
    gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-98 )

Server that I want to monitor (nrpe installed):
    Fedora Core release 2 (Tettnang)
    Apache .2.2.3
    version gcc 3.3.3 20040412 (Red Hat Linux 3.3.3-7)

I have been using nagios v1.x for some time now on a different 
server (and network).
I am doing a new installation of the 2.5 release.
I am starting fresh, I am not trying to import anything.

I installed an RPM version by rebuilding the src rpm.
When I start nagios (/etc/init.d/nagios start) I get no errors:
   Starting network monitor: nagios
In the nagios.log I see no errors:
  [1158259211] Nagios 2.5 starting... (PID=23923)
  [1158259211] LOG VERSION: 2.0
  [1158259211] Finished daemonizing... (New PID=25591)
The file /var/log/messages has similar lines.

nagios -s reports:

Nagios 2.5
Copyright (c) 1999-2006 Ethan Galstad (http://www.nagios.org)
Last Modified: 07-13-2006
License: GPL

Projected scheduling information for host and service
checks is listed below.  This information assumes that
you are going to start running Nagios with your current
config files.

HOST SCHEDULING INFORMATION
---------------------------
Total hosts:                     1
Total scheduled hosts:           0
Host inter-check delay method:   SMART
Average host check interval:     0.00 sec
Host inter-check delay:          0.00 sec
Max host check spread:           30 min
First scheduled check:           N/A
Last scheduled check:            N/A


SERVICE SCHEDULING INFORMATION
-------------------------------
Total services:                     6
Total scheduled services:           6
Service inter-check delay method:   SMART
Average service check interval:     300.00 sec
Inter-check delay:                  50.00 sec
Interleave factor method:           SMART
Average services per host:          6.00
Service interleave factor:          6
Max service check spread:           30 min
First scheduled check:              Fri Sep 15 16:30:51 2006
Last scheduled check:               Fri Sep 15 16:35:01 2006


CHECK PROCESSING INFORMATION
----------------------------
Service check reaper interval:      10 sec
Max concurrent service checks:      Unlimited


PERFORMANCE SUGGESTIONS
-----------------------
I have no suggestions - things look okay.



I used the minimal.cfg file, renamed it and simply changed 
the definitions for the services, hosts, contact, etc.. I 
also changed the chekc_commands to use the check_nrpe like 
this :
define command{
	command_name check_nrpe
	command_line .../check_nrpe -H $HOSTADDRESS$ -c $ARG1$
      	}

define service {
       ...
        check_command    check_nrpe!check_disk1
}

I have the nrpe deaemon running on the remote host with 
the IP of where nagios is in the allowed_host config 
(running as xinet.d) and I have  verified it is running 
by telnet localhost 5666 and getting a connection 
from both servers (nagios server and the server I want 
to monitor).

At first, I had check external commands disabled.  I 
could use the web interface but all services were PENDING
 and staying PENDING even after the time went past the 
next scheduled check.

And when I was trying to stop nagios (/etc/init.d/nagios stop) 
I would get  an error (the only error I saw in all this mess):
   Stopping network monitor: nagios
   Waiting for nagios to exit . . . . . . . . . . .
   Warning - running nagios did not exit in time

Some left over nagios process is left running and I have 
to "killall -9 nagios" to stop nagios otherwise I end up 
having multiple copies running.

I've enabled check extern commands and changed the "rw" 
directory's  permission based on the documentation 
(http://nagios.sourceforge.net/docs/2_0/commandfile.html) 
and now the web says nagios is not running even though I 
restarted apache and nagios.
Error I get is : 
"Whoops!  Error: Could not read host and service status
information!"
However nagios IS running. 
(ps awux | grep nagios shows 3 processes)

apache is running as "daemon" so I have added secondary group 
nagiocmd to nagios and to daemon.
the permission of the directory is:
drwxrws--- nagios nagiocmd /var/log/nagios/rw/

When I run nagios -v /etc/nagios/nagios.cfg I get no 
warning and no errors (Things look okay).
I have 6 services, one host, one host group, one 
contact, one contact group, 9 commands and one 
timeperiod.

I've disabled the retention as suggested in this 
forum for services and hosts left in "PENDING", but 
no use.  I would very much like to force a 
check using the CGI but I am having trouble there too.

The log directory does not contains a status.log file, 
I do not know if that's relevant.
I tried starting the daemon by hand 
(nagios -d /etc/nagios/nagios.cfg) but 
I got no error messages there either.

I've searched on google to the only error message 
(in the title of this post) but only found reference 
to the script.
I went over the parts of the documentation that seem 
related to my problem, I've looked thru the nagios 
FAQ and the forum. 

So in summary, nagios cannot be stopped properly, 
the services are  never checked, maybe a log file 
is missing, there's not really any usefull error 
messages and I am lossing patience!

I think I have done everything I can possibly can 
but I must be missing something that is too obvious 
and too simple for me to see, or something related 
to some information I am not aware of.

Any pointers would be appreciated.

Thank you

Christian Roy


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: notification_interval (per user)
Next message: 2.5 Host Check not working after upgrade
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list