Number of Nagios Processes Distributed Monitoring

Mike Benoit mikeb at netnation.com
Mon Aug 25 17:58:37 CEST 2003


I'm having the exact same problem with Nagios 1.1. There hasn't been any
official fix for this released yet correct? It sure makes using passive
checks difficult. :(

On Fri, 2003-07-25 at 09:56, Mooney, Ryan wrote:
> I had a simular problem when doing lots of external checks.  The sub process that
> gets forked to read the results from the .cmd pipe and then write them to the shared
> fd to the master process would block (forever) on the write call.  I never did figure 
> out why, since the code appeared to be correct.  I ended up putting an alarm around
> the write call and timing it out if it hung to long.  I figured that loosing a few 
> passive checks was worth not having memory fill up & having the machine die.  Based on
> the behavior I saw, I'm not really convinced that the problem is 100% limited to the
> passive checks though, as a very simular set of routines is used by the active checks
> code.
> 
> If you compile nagios with debugging (export "CFLAGS=-g"; ./configure --whatever-options-you-use; make; make install) and then watch the "ps aux" output you'll notice
> that there is one really long running process that takes a fair bit of CPU (which is 
> the good master) and then over time you'll start seeing some other processes that have
> a start time a fair bit in the past that never die.  If you attach to one of these with
> a debugger (say "cd /wherever/you/compiled/nagios/; gdb base/nagios [pid]" where [pid] 
> is the process ID of one of the processes with a start time > 1hr ago that is not the 
> master process) and do a "bt" to get a call trace out of it that would likely help 
> determine where the processes are getting stuck.
> 
> If you are having the same problem I was  you will likely see "process_passive_service_checks" and/or "check_for_external_commands" in the call trace 
> (sometimes the stack looks munged so the call stack may not be 100% accurate, leading me 
> to believe that some corruption is whats causing the write to hang, but I wasn't able to 
> figure out what was causing the corruption easily and had to "get things working").
> 
> I'd be curious to see if its the same problem.
> 
> > >Jasmine 
> > I am pretty sure, not nagios itself, but memory ran out and the server
> > stood. 
> > At the moment I have a nagios uptime of : 
> > 
> > Total Running Time: 0d 6h 6m 15s 
> > And this... 
> > Check Command Output:  Nagios ok: located 1677 processes, status log
> > updated 170 seconds ago   
> > 
> > I am pretty sure this is mot ok,
> > 
> > Any Ideas ? 
> > 
> > I will let the server run over the weekend, when it crashes again, I
> > give detailed information to the list. 
> > 
> > 
> > 
> > -------------------------------------------------------
> > This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> > Data Reports, E-commerce, Portals, and Forums are available now.
> > Download today and enter to win an XBOX or Visual Studio .NET.
> > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet
> > _072303_01/01
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS 
> > when reporting any issue. 
> > ::: Messages without supporting info will risk being sent to /dev/null
> > 
> 
> 
> -------------------------------------------------------
> This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> Data Reports, E-commerce, Portals, and Forums are available now.
> Download today and enter to win an XBOX or Visual Studio .NET.
> http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null




-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list