Nagios hangs on startup
Eric Cables
ecables at gmail.com
Thu Jul 1 23:36:04 CEST 2010
Well, I tried to duplicate the command that is showing up in the 'ps -xw'
output, and it just hangs.
[nagios at psdbsd01 (~)]$ whoami
nagios
[nagios at psdbsd01 (~)]$ su - nagios -c touch /usr/local/nagios/var/nagios.log
/usr/local/nagios/var/retention.dat
^^ hangs here.
In fact, if I just try to 'su - nagios' the process hangs as well.
Using su with other parameters works, however, so the binary seems to
function:
[nagios at psdbsd01 (~)]$ su -
Password:
[root at psdbsd01 (~)]#
And su - nagios from the root user appears to work fine.
[root at psdbsd01 (~)]# su - nagios
[nagios at psdbsd01 (~)]$
But su - nagios does not (as the nagios user):
[nagios at psdbsd01 (~)]$ su - nagios
^^ hangs
Sorry for all the noise.
-- Eric Cables
On Thu, Jul 1, 2010 at 2:15 PM, Eric Cables <ecables at gmail.com> wrote:
> Here are a few more details I've been able to gather.
>
> Here's the output of a truss on the init script w/ the start statement:
> Starting nagios:write(1,"Starting nagios:",16) = 16
> (0x10)
> fork(0x90,0xbfbfe9f8,0xa,0x8062a35,0x0,0x0) = 55445 (0xd895)
> getpgrp(0x0,0x0,0xd895,0x0,0x2831c0c0,0x0) = 55444 (0xd894)
> wait4(0xffffffff,0xbfbfe9d8,0x2,0x0,0x213,0x1) = 55445 (0xd895)
> stat("/sbin/su",0xbfbfe6f8) ERR#2 'No such file or
> directory'
> stat("/bin/su",0xbfbfe6f8) ERR#2 'No such file or
> directory'
> stat("/usr/sbin/su",0xbfbfe6f8) ERR#2 'No such file or
> directory'
> stat("/usr/bin/su",{ mode=-r-sr-xr-x
> ,inode=14512669,size=14496,blksize=4096 }) = 0 (0x0)
> fork(0x0,0x0,0x4b156e10,0x0,0x0,0x0) = 55446 (0xd896)
> getpgrp(0x0,0x0,0xd896,0x0,0x2831c0c0,0x0) = 55444 (0xd894)
>
> ^^^ This is where it hangs.
>
> ps -ax | grep nagios shows the following:
> 55443 6 I+ 0:00.02 truss /usr/local/etc/rc.d/nagios.sh start
> 55444 6 IX 0:00.01 /bin/sh /usr/local/etc/rc.d/nagios.sh start
> 55447 6 S 0:00.07 su - nagios -c touch
> /usr/local/nagios/var/nagios.log /usr/local/nagios/var/retention.dat
>
> Here is retention.dat (not sure why it would hang here):
> -rw------- 1 nagios nagios 2008435 Jul 1 12:26 retention.dat
>
> These are really the only clues I'm able to find at this point.
>
> -- Eric Cables
>
>
>
> On Thu, Jul 1, 2010 at 2:09 PM, Eric Cables <ecables at gmail.com> wrote:
>
>> Thanks for the reply. I ended up rebooting the box, which fixed the
>> problem temporarily, but it has resurfaced again. When I drill down into a
>> service check it says that the next check will be processed at a time that
>> has already passed.
>>
>> For example:
>> Last Check: 13:09
>> Next Check: 13:11
>>
>> The current time on, however, is 14:02...
>>
>> When I try to stop the process via the init script I get the following:
>> [nagios at psdbsd01 (~/var)]$ /usr/local/etc/rc.d/nagios.sh stop
>> Stopping nagios: ..........
>> Warning - nagios did not exit in a timely manner
>>
>> The cmd file does not exist prior to attempting to start, after stopping,
>> but I back to the problem where Nagios will not start and instead hangs
>> indefenitely when requested to start.
>>
>> [nagios at psdbsd01 (~/var)]$ /usr/local/etc/rc.d/nagios.sh start
>> Starting nagios: <-- hangs here
>>
>> I'm not sure about the lock file, this is a FreeBSD install from source,
>> and I don't see a /var/lock directory at all. Everything Nagios related is
>> installed in /usr/local/nagios as far as I can tell.
>>
>> There doesn't seem to be anything of interest in nagios.log, as the last
>> entry just reports a notification that was sent out prior to Nagios losing
>> its functionality.
>>
>> Any other tips? I'm not exactly sure why a reboot fixed this before, but
>> any speculation is appreciated.
>>
>> -- Eric Cables
>>
>>
>>
>> On Thu, Jul 1, 2010 at 6:05 AM, Jim Avery <jim at jimavery.me.uk> wrote:
>>
>>> On 1 July 2010 01:18, Eric Cables <ecables at gmail.com> wrote:
>>> > Sorry to bug the list, but my 3.2.1 installation of Nagios has all of a
>>> > sudden stopped starting. I noticed a lack of alerts over the last day,
>>> and
>>> > when I checked the GUI it indicated that the "next" scheduled check for
>>> a
>>> > service was in the past. I proceeded to stop/start Nagios, but both
>>> have
>>> > failed.
>>> >
>>> > Currently when I try to start Nagios using the init script it just
>>> hangs:
>>> > [nagios at psdbsd01 (~/etc)]$ /usr/local/etc/rc.d/nagios.sh start
>>> > Starting nagios:
>>> >
>>> > I've enabled debug logging (-1 level, 2 verbosity), but this is all
>>> that
>>> > shows up in nagios.debug when I issue the above start request (uid 1003
>>> =
>>> > nagios):
>>> > [1277942532.270096] [001.0] [pid=46503] drop_privileges() start
>>> > [1277942532.270194] [004.0] [pid=46503] Original UID/GID: 1003/1003
>>> >
>>> > I can run nagios -v nagios.cfg, and it reports no errors.
>>> >
>>> > Here's the output if I run nagios nagios.cfg manually, without invoking
>>> > daemon mode:
>>> > [nagios at psdbsd01 (~/etc)]$ ../bin/nagios ./nagios.cfg
>>> >
>>> > Nagios Core 3.2.1
>>> > Copyright (c) 2009-2010 Nagios Core Development Team and Community
>>> > Contributors
>>> > Copyright (c) 1999-2009 Ethan Galstad
>>> > Last Modified: 03-09-2010
>>> > License: GPL
>>> >
>>> > Website: http://www.nagios.org
>>> >
>>> > Any tips? I am not sure what the next steps are since both logging and
>>> > debugging aren't producing output, and Nagios has never taken more than
>>> a
>>> > few seconds to start in the past.
>>>
>>> What, if anything, shows up in your nagios.log file?
>>>
>>> Check you don't already have a nagios daemon running (ps -ef | grep
>>> nagios) before you start it again.
>>>
>>> Check that the lock file isn't there from the previous invocation (if
>>> you did a standard install from source tarballs the file is
>>> /var/lock/subsys/nagios).
>>>
>>> Check that the Nagios command file /usr/local/nagios/var/rw/nagios.cmd
>>> doesn't exist before you start nagios.
>>>
>>> Use full pathnames when attempting to verify your config, for example:
>>>
>>> /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Nagios-users mailing list
>>> Nagios-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>> ::: Please include Nagios version, plugin version (-v) and OS when
>>> reporting any issue.
>>> ::: Messages without supporting info will risk being sent to /dev/null
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100701/9cac6b20/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list