Nagios hangs on startup

Eric Cables ecables at gmail.com
Thu Jul 1 23:15:43 CEST 2010


Here are a few more details I've been able to gather.

Here's the output of a truss on the init script w/ the start statement:
Starting nagios:write(1,"Starting nagios:",16)                   = 16 (0x10)
fork(0x90,0xbfbfe9f8,0xa,0x8062a35,0x0,0x0)      = 55445 (0xd895)
getpgrp(0x0,0x0,0xd895,0x0,0x2831c0c0,0x0)       = 55444 (0xd894)
wait4(0xffffffff,0xbfbfe9d8,0x2,0x0,0x213,0x1)   = 55445 (0xd895)
stat("/sbin/su",0xbfbfe6f8)                      ERR#2 'No such file or
directory'
stat("/bin/su",0xbfbfe6f8)                       ERR#2 'No such file or
directory'
stat("/usr/sbin/su",0xbfbfe6f8)                  ERR#2 'No such file or
directory'
stat("/usr/bin/su",{ mode=-r-sr-xr-x ,inode=14512669,size=14496,blksize=4096
}) = 0 (0x0)
fork(0x0,0x0,0x4b156e10,0x0,0x0,0x0)             = 55446 (0xd896)
getpgrp(0x0,0x0,0xd896,0x0,0x2831c0c0,0x0)       = 55444 (0xd894)

^^^ This is where it hangs.

ps -ax | grep nagios shows the following:
55443   6  I+     0:00.02 truss /usr/local/etc/rc.d/nagios.sh start
55444   6  IX     0:00.01 /bin/sh /usr/local/etc/rc.d/nagios.sh start
55447   6  S      0:00.07 su - nagios -c touch
/usr/local/nagios/var/nagios.log /usr/local/nagios/var/retention.dat

Here is retention.dat (not sure why it would hang here):
-rw-------  1 nagios  nagios  2008435 Jul  1 12:26 retention.dat

These are really the only clues I'm able to find at this point.

-- Eric Cables


On Thu, Jul 1, 2010 at 2:09 PM, Eric Cables <ecables at gmail.com> wrote:

> Thanks for the reply.  I ended up rebooting the box, which fixed the
> problem temporarily, but it has resurfaced again.  When I drill down into a
> service check it says that the next check will be processed at a time that
> has already passed.
>
> For example:
> Last Check: 13:09
> Next Check: 13:11
>
> The current time on, however, is 14:02...
>
> When I try to stop the process via the init script I get the following:
> [nagios at psdbsd01 (~/var)]$ /usr/local/etc/rc.d/nagios.sh stop
> Stopping nagios: ..........
> Warning - nagios did not exit in a timely manner
>
> The cmd file does not exist prior to attempting to start, after stopping,
> but I back to the problem where Nagios will not start and instead hangs
> indefenitely when requested to start.
>
> [nagios at psdbsd01 (~/var)]$ /usr/local/etc/rc.d/nagios.sh start
> Starting nagios: <-- hangs here
>
> I'm not sure about the lock file, this is a FreeBSD install from source,
> and I don't see a /var/lock directory at all.  Everything Nagios related is
> installed in /usr/local/nagios as far as I can tell.
>
> There doesn't seem to be anything of interest in nagios.log, as the last
> entry just reports a notification that was sent out prior to Nagios losing
> its functionality.
>
> Any other tips?  I'm not exactly sure why a reboot fixed this before, but
> any speculation is appreciated.
>
> -- Eric Cables
>
>
>
> On Thu, Jul 1, 2010 at 6:05 AM, Jim Avery <jim at jimavery.me.uk> wrote:
>
>> On 1 July 2010 01:18, Eric Cables <ecables at gmail.com> wrote:
>> > Sorry to bug the list, but my 3.2.1 installation of Nagios has all of a
>> > sudden stopped starting.  I noticed a lack of alerts over the last day,
>> and
>> > when I checked the GUI it indicated that the "next" scheduled check for
>> a
>> > service was in the past.  I proceeded to stop/start Nagios, but both
>> have
>> > failed.
>> >
>> > Currently when I try to start Nagios using the init script it just
>> hangs:
>> > [nagios at psdbsd01 (~/etc)]$ /usr/local/etc/rc.d/nagios.sh start
>> > Starting nagios:
>> >
>> > I've enabled debug logging (-1 level, 2 verbosity), but this is all that
>> > shows up in nagios.debug when I issue the above start request (uid 1003
>> =
>> > nagios):
>> > [1277942532.270096] [001.0] [pid=46503] drop_privileges() start
>> > [1277942532.270194] [004.0] [pid=46503] Original UID/GID: 1003/1003
>> >
>> > I can run nagios -v nagios.cfg, and it reports no errors.
>> >
>> > Here's the output if I run nagios nagios.cfg manually, without invoking
>> > daemon mode:
>> > [nagios at psdbsd01 (~/etc)]$ ../bin/nagios ./nagios.cfg
>> >
>> > Nagios Core 3.2.1
>> > Copyright (c) 2009-2010 Nagios Core Development Team and Community
>> > Contributors
>> > Copyright (c) 1999-2009 Ethan Galstad
>> > Last Modified: 03-09-2010
>> > License: GPL
>> >
>> > Website: http://www.nagios.org
>> >
>> > Any tips?  I am not sure what the next steps are since both logging and
>> > debugging aren't producing output, and Nagios has never taken more than
>> a
>> > few seconds to start in the past.
>>
>> What, if anything, shows up in your nagios.log file?
>>
>> Check you don't already have a nagios daemon running (ps -ef | grep
>> nagios) before you start it again.
>>
>> Check that the lock file isn't there from the previous invocation (if
>> you did a standard install from source tarballs the file is
>> /var/lock/subsys/nagios).
>>
>> Check that the Nagios command file /usr/local/nagios/var/rw/nagios.cmd
>> doesn't exist before you start nagios.
>>
>> Use full pathnames when attempting to verify your config, for example:
>>
>> /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>> reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100701/de2f3144/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list