ocsp/ochp zombie with restart?
Andreas Ericsson
ae at op5.se
Thu Mar 3 00:54:19 CET 2005
Percy Jahn wrote:
> Hello ,
>
> if i restart nagios via init.d script it happens, that sometimes nagios
> is not being killed. I am using an CVS Snapshot from mid of dec 2004 on
> suse v9.0. (Killing is done by "kill <nagiospid>" as far i can see.)
> It is also possible, that this behavior happened at reload. We use an
> automated script, to copy configurations to machines and restart/reload
> nagios and i didnt figured out, when exactly this happens. It is hard to
> debug, because it happens rarely.
>
Actually it doesn't happen at all but nagios sits and waits for the
worker threads to exit gracefully. It happens when nagios is reaping
check results. It's important that you don't try to start nagios again
before the worker threads are fired up again, or things will go haywire.
> If i take a look at running processes, everytime i detect one zombie
> process called as ocsp/ochp via nagios.
Is the state actually Z? If so, you might have a warped pthreads
implementation, or init might not be doing its job reaping orphaned
processes. It could also be that one of the worker threads is caught in
uninterruptable IO while reading from the command-pipe while the master
thread deletes or closes it. I don't think it's a terribly good idea for
Nagios to catch SIGPIPE, considering it handles one, but I'm not the
boss of that.
> I suppose, nagios was being
> killed, while making an ocsp/ochp check and the parent process of the
> check exits without killing all childs. (The parent of the check-process
> is the child of the process being killed by the kill command)
>
This is weird. It indicates that the master pid isn't actually written
to the nagios pid file (which is weirdly named lock file everywhere in
documentation, configuration and code - weirdly because lockfile is a
redhat invention placed in /var/lock/subsys/progname and always empty).
> Ive not detected a bugfix, solving this problem on cvs. So i skipped checking
> the newest version.
>
> Is this a known issue? Or some other suggestions? Maybe for better
> debugging?
>
gdb is your friend.
> --
> Best regards
> Percy Jahn
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
More information about the Developers
mailing list