Number of Nagios Processes Distributed Monitoring
Mike Benoit
mikeb at netnation.com
Mon Aug 25 17:58:37 CEST 2003
I'm having the exact same problem with Nagios 1.1. There hasn't been any
official fix for this released yet correct? It sure makes using passive
checks difficult. :(
On Fri, 2003-07-25 at 09:56, Mooney, Ryan wrote:
> I had a simular problem when doing lots of external checks. The sub process that
> gets forked to read the results from the .cmd pipe and then write them to the shared
> fd to the master process would block (forever) on the write call. I never did figure
> out why, since the code appeared to be correct. I ended up putting an alarm around
> the write call and timing it out if it hung to long. I figured that loosing a few
> passive checks was worth not having memory fill up & having the machine die. Based on
> the behavior I saw, I'm not really convinced that the problem is 100% limited to the
> passive checks though, as a very simular set of routines is used by the active checks
> code.
>
> If you compile nagios with debugging (export "CFLAGS=-g"; ./configure --whatever-options-you-use; make; make install) and then watch the "ps aux" output you'll notice
> that there is one really long running process that takes a fair bit of CPU (which is
> the good master) and then over time you'll start seeing some other processes that have
> a start time a fair bit in the past that never die. If you attach to one of these with
> a debugger (say "cd /wherever/you/compiled/nagios/; gdb base/nagios [pid]" where [pid]
> is the process ID of one of the processes with a start time > 1hr ago that is not the
> master process) and do a "bt" to get a call trace out of it that would likely help
> determine where the processes are getting stuck.
>
> If you are having the same problem I was you will likely see "process_passive_service_checks" and/or "check_for_external_commands" in the call trace
> (sometimes the stack looks munged so the call stack may not be 100% accurate, leading me
> to believe that some corruption is whats causing the write to hang, but I wasn't able to
> figure out what was causing the corruption easily and had to "get things working").
>
> I'd be curious to see if its the same problem.
>
> > >Jasmine
> > I am pretty sure, not nagios itself, but memory ran out and the server
> > stood.
> > At the moment I have a nagios uptime of :
> >
> > Total Running Time: 0d 6h 6m 15s
> > And this...
> > Check Command Output: Nagios ok: located 1677 processes, status log
> > updated 170 seconds ago
> >
> > I am pretty sure this is mot ok,
> >
> > Any Ideas ?
> >
> > I will let the server run over the weekend, when it crashes again, I
> > give detailed information to the list.
> >
> >
> >
> > -------------------------------------------------------
> > This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> > Data Reports, E-commerce, Portals, and Forums are available now.
> > Download today and enter to win an XBOX or Visual Studio .NET.
> > http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet
> > _072303_01/01
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS
> > when reporting any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
>
>
> -------------------------------------------------------
> This SF.Net email sponsored by: Free pre-built ASP.NET sites including
> Data Reports, E-commerce, Portals, and Forums are available now.
> Download today and enter to win an XBOX or Visual Studio .NET.
> http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
-------------------------------------------------------
This SF.net email is sponsored by: VM Ware
With VMware you can run multiple operating systems on a single machine.
WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines
at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list