Fork again...

Andreas Ericsson ae at op5.se
Thu Feb 3 09:42:43 CET 2005


michael at optusnet.com.au wrote:
> Andreas Ericsson <ae at op5.se> writes:
> 
>>Oscar Paniagua wrote:
>>
>>>I need to make the next question:
>>>- For what in check.c you make two fork's:
>>>/* fork a child process */
>>>	pid=fork();
>>>/* fork again... */
>>>		pid=fork();
>>
>>So as to not create zombie processes in case something goes wrong with
>>plugin execution. Otherwise an uninterruptable IO request could quite
>>possibly hang the entire daemon.
>>
> 
> 
> That doesn't make much sense. How can an IO request in the
> child affect the parent? The parent should be doing non-blocking
> wait() requests to reap the child processes, which will clean up
> any zombies as they appear.
> 

The parent has to fork a process group leader first so it can reap the 
response from the right fork(). If it didn't fork twice it would lead to 
a race condition on linux, bsd and solaris when the child exits if other 
checks are being run simultaneously.

> Right now the code is mildly awful.

The indent program helps a great deal. It's a shame it can't handle 
Ethan's extra indentation of closing braces.

> The parent does a fork()
> following by a blocking wait(). the child does fork() and exit.
> the grandchild does the check.
> 
> The much better thing to do would be to just fork() and in the
> main loop do a wait(WNOHANG) to clean up the children as they
> exit. 
> 

No, the current procedure is considered best practice for executing 
commands. sshd does it the same way (actually it does fork(), setuid(), 
fork(), fork() setpgid() fork() system(), but that's because it has to 
drop privileges as well).

> Nagios number one problem at the moment is the terrible scalability
> and the penchant to make things much more expensive than they should
> be doesn't help. :)
>

Nagios scales fairly well, although I agree on the expensiveness. 
However, while 2.0 is still in beta I feel it would be much more useful 
to make sure it is 100% stable before starting to experiment with 
speedups. We have a customer who gets a couple of (two usually) 
coredumps a week. I've hacked up a small script to restart nagios more 
or less instantly, but something isn't right and I've been unable to 
find the reason for it for the last 10 weeks.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl




More information about the Developers mailing list