FW: hundreds of procs

DTerrell at Delphi-Tech.com DTerrell at Delphi-Tech.com
Wed Jun 18 17:51:36 CEST 2003


Just as I sent this one out...I noticed nagios forked again with these
stats:

 11:50am  up  1:37,  1 user,  load average: 0.22, 0.15, 0.13
52 processes: 50 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  0.1% user,  0.3% system,  0.0% nice, 99.4% idle
Mem:   124840K av,  110132K used,   14708K free,       0K shrd,   24412K
buff
Swap:  262040K av,       0K used,  262040K free                   20844K
cached

Any more thoughts?

-----Original Message-----
From: David A. Terrell 
Sent: Wednesday, June 18, 2003 11:50 AM
To: 'Williams, P. Lane'; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs


Unfortunately there are two threads running around this problem, this being
the more active..

I'm using a distributed setup, two remote nagios boxes (actually one on the
same switch) with one central.  The box in question is the central box that
accepts 100% passive checks for 81 hosts and 91 services.  Each distributed
server is running ~half that.  Nagios has still yet to loop since I
rebooted, though I do expect it to start soon.

-Dave

-----Original Message-----
From: Williams, P. Lane [mailto:Lane.Williams at jhuapl.edu]
Sent: Wednesday, June 18, 2003 11:47 AM
To: 'DTerrell at Delphi-Tech.com'; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs


I am sorry if I missed it but,

How many active service checks?
How many passive service checks?
How many host checks?

are you performing.

Lane

-----Original Message-----
From: DTerrell at Delphi-Tech.com [mailto:DTerrell at Delphi-Tech.com]
Sent: Wednesday, June 18, 2003 11:39 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs


That top output isn't a depiction of how the box looks when nagios is
normally running.  Nagios continues to grow larger and larger until the
machine has so many procs the load average goes up.  This is a more typical
top output, and was taken after rebooting the machine and having nagios run
for ~1hr:

 11:36am  up  1:22,  1 user,  load average: 0.12, 0.16, 0.15
50 processes: 47 sleeping, 3 running, 0 zombie, 0 stopped
CPU states:  0.3% user,  0.1% system,  0.0% nice, 99.4% idle
Mem:   124840K av,  109744K used,   15096K free,       0K shrd,   20072K
buff
Swap:  262040K av,       0K used,  262040K free                   25592K
cached

At some point today I expect the box to trip and fall hard onto another
hundred-someodd nagios procs pushing the load higher and higher as it goes.
Its seems a circular problem that when nagios gets slightly overloaded it
doesn't recover and load gets higher, the next time its congested (this time
a smaller threshold) it does it again, until it really chokes out the
system.  I'm concerned nagios isn't capable of handling such a
load...perhaps this should be pointed out to the developers?

-Dave

-----Original Message-----
From: Williams, P. Lane [mailto:Lane.Williams at jhuapl.edu]
Sent: Wednesday, June 18, 2003 10:51 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs


I see the same thing.  But I think this is the way it should be.  Nagios is
a parallel application.  I typically run with an average of 50 - 70 nagios
procs a second and sometimes peaking at 300.  System load typically runs
between 3 and 4, which for a typical server would be high.  I have Sun
Enterprise application servers that run with a load average above 10 all day
and Sun Enterprise backup servers that run with a load of 6 or higher. 

I suggest running "top" and watching whats going on.  If your "iowait" is
0%, memory looks good, and the sleeping processes flucuate with on-going
processes, I'd say your running just fine.  The fact that you have a high
load average may just mean you need a newer/faster server.  I run with dual
xeons on a gig of ram, and sometimes peek my load average at 19.  The only
problems I've noticed is with the default setting of "Sendmail" rejecting
request when load is above 12.  I just reset those settings to 70 and all
looks good.

If your having problems with Nagios not completing checks in a timely
fashion, I recommend revisting your configuration.  If you have a high
number of passive checks you may need to account for that as well.

Lane 

-----Original Message-----
From: DTerrell at Delphi-Tech.com [mailto:DTerrell at Delphi-Tech.com]
Sent: Wednesday, June 18, 2003 10:10 AM
To: thomas.blidung at philips.com; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs


By the way, this is from one day of Nagios being active without me
restarting it:

 10:08am  up 29 days, 20:57,  1 user,  load average: 11.06, 14.66, 15.69
1001 processes: 1000 sleeping, 1 running, 0 zombie, 0 stopped

-----Original Message-----
From: thomas.blidung at philips.com [mailto:thomas.blidung at philips.com]
Sent: Wednesday, June 18, 2003 2:53 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] hundreds of procs


Hi to everyone,

I just was reading the posting from Dave and his problem "nagios looping -
hundreds of procs"

It seems that I got the same problem. Up to now there is a relationship
between the frequency of checks an the occurance of many nagios-tasks. But
even if I set the normal_check_interval to al long period (10 minutes) it
happens, that after one or two
days there ar up to 200 or more nagios tasks.
Is this problem already solved I would like to get the solution.

regards
   tom


Thomas Blidung
Philips Research Hamburg
Tel. 5078-2838


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list