<DIV>
<DIV>Just for fun, you might try creating the problem and see how many forks</DIV>
<DIV>you *can* get, for example:</DIV>
<DIV> </DIV>
<DIV>#!/usr/bin/perl</DIV>
<DIV><BR>my $c=0;</DIV>
<DIV>do {<BR> my $pid = fork();<BR> if ($pid)<BR> {<BR> $c++;<BR> print "\rchildcount $c ";<BR> }<BR> else<BR> {<BR> sleep(1);<BR> exit(0);<BR> }<BR> } while 1;<BR></DIV>
<DIV>to create as many procs as you can and test your limit. You would</DIV>
<DIV>want to do this under the same environment as the nagios process</DIV>
<DIV>runs.</DIV>
<DIV> </DIV>
<DIV>They will all be kept defunct until the process exits (when you</DIV>
<DIV>hit the max processes you can create)</DIV>
<DIV> </DIV>
<DIV>The other thing you might try is to start nagios under</DIV>
<DIV>strace -f and output the data to a log. You can specify</DIV>
<DIV>just forks for strace, i.e., strace -f -e trace=process >/tmp/,log 2>&1 nagios ....</DIV>
<DIV> </DIV>
<DIV>That would give you a good handle on what is going on when the failure</DIV>
<DIV>occurs. Might slow nagios down a bit, but probably nothing significant.</DIV>
<DIV> </DIV>
<DIV>-FredC</DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV> </DIV>
<DIV><BR><BR><B><I>Terry <td3201@gmail.com></I></B> wrote:</DIV>
<BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">I have a program that checks the logs by the minute and pages when the<BR>fork errors occur, so we are responding within minutes. I have looked<BR>at the resources every time it happens and we have plenty of<BR>resources. Is there a single plugin I can put into debugging mode so<BR>that when this happens I get more information as to why it is giving<BR>these errors? Here are a few facts:<BR>- the system is fine with memory all the time, never runs out (resident/paging)<BR>- there are not an unusual amount of processes running, maybe around<BR>200 at a time, but no where near the ulimit setting<BR>- ulimit for the 'nagios' user matches that of root (unlimited). here<BR>is the ulimit:<BR>core file size (blocks, -c) 0<BR>data seg size (kbytes, -d) unlimited<BR>file size (blocks, -f) unlimited<BR>max locked memory (kbytes, -l) 4<BR>max memory size (kbytes, -m) unlimited<BR>open files (-n)
1024<BR>pipe size (512 bytes, -p) 8<BR>stack size (kbytes, -s) 10240<BR>cpu time (seconds, -t) unlimited<BR>max user processes (-u) 7168<BR>virtual memory (kbytes, -v) unlimited<BR><BR>Thanks,<BR>Terry<BR><BR><BR><BR>On 9/1/05, Fred <F1216@YAHOO.COM>wrote:<BR>> My guess would be to look at your resource utilization on your system,<BR>> most likely causes for fork() to fail are no more process slots, out of<BR>> memory, or past some kind of per-user (non-root) limit. When this<BR>> occurs look at your system logs, ps output and see if you have *lots*<BR>> of processes hanging around. It could be that nagios has stopped reaping<BR>> its children (or another unrelated process has sucked up the resources)<BR>> and you have simply pushed your system to the edge. It might be that you<BR>> get to that situation and it backs off before you even notice it and you<BR>> are left with nagios having problems dealing with the aftermath.<BR>> <BR>> -FredC<BR>>
<BR>> --- Terry <TD3201@GMAIL.COM>wrote:<BR>> <BR>> > Hello,<BR>> ><BR>> > I have been having this issue for quite some time. For some unknown<BR>> > reason, nagios stops performing checks with these errors:<BR>> ><BR>> > [1125536952] Warning: The check of service 'PING' on host 'hostname'<BR>> > could not be performed due to a fork() error. The check will be<BR>> > rescheduled.<BR>> ><BR>> > All checks fail like this until nagios is restarted. When this<BR>> > problem is occuring I can run the service checks manually both as the<BR>> > nagios user and as the root user. There are no resource problems that<BR>> > I can see at the time. We do not appear to be hitting a limit with<BR>> > open files or anything like that either. The nagios mirrors the root<BR>> > user in that area.<BR>> ><BR>> > What could be wrong?<BR>> ><BR>> > Thanks!<BR>> ><BR>>
><BR>> > -------------------------------------------------------<BR>> > SF.Net email is Sponsored by the Better Software Conference & EXPO<BR>> > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices<BR>> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA<BR>> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf<BR>> > _______________________________________________<BR>> > Nagios-users mailing list<BR>> > Nagios-users@lists.sourceforge.net<BR>> > https://lists.sourceforge.net/lists/listinfo/nagios-users<BR>> > ::: Please include Nagios version, plugin version (-v) and OS when reporting<BR>> > any issue.<BR>> > ::: Messages without supporting info will risk being sent to /dev/null<BR>> ><BR>> <BR>> <BR>> <BR>> <BR>> <BR>><BR></BLOCKQUOTE></DIV><BR><BR><P><IMG style="WIDTH: 59px; HEIGHT: 58px"
height=45 src="http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys/4.gif" width=56><IMG height=40 src="http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys/3.gif" width=47><IMG style="WIDTH: 51px; HEIGHT: 42px" height=60 src="http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys/j.gif" width=67><IMG style="WIDTH: 51px; HEIGHT: 40px" height=56 src="http://us.i1.yimg.com/us.yimg.com/i/mesg/tsmileys/8.gif" width=68></P>