hundreds of procs
DTerrell at Delphi-Tech.com
DTerrell at Delphi-Tech.com
Wed Jun 18 17:49:56 CEST 2003
Unfortunately there are two threads running around this problem, this being
the more active..
I'm using a distributed setup, two remote nagios boxes (actually one on the
same switch) with one central. The box in question is the central box that
accepts 100% passive checks for 81 hosts and 91 services. Each distributed
server is running ~half that. Nagios has still yet to loop since I
rebooted, though I do expect it to start soon.
-Dave
-----Original Message-----
From: Williams, P. Lane [mailto:Lane.Williams at jhuapl.edu]
Sent: Wednesday, June 18, 2003 11:47 AM
To: 'DTerrell at Delphi-Tech.com'; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs
I am sorry if I missed it but,
How many active service checks?
How many passive service checks?
How many host checks?
are you performing.
Lane
-----Original Message-----
From: DTerrell at Delphi-Tech.com [mailto:DTerrell at Delphi-Tech.com]
Sent: Wednesday, June 18, 2003 11:39 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs
That top output isn't a depiction of how the box looks when nagios is
normally running. Nagios continues to grow larger and larger until the
machine has so many procs the load average goes up. This is a more typical
top output, and was taken after rebooting the machine and having nagios run
for ~1hr:
11:36am up 1:22, 1 user, load average: 0.12, 0.16, 0.15
50 processes: 47 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: 0.3% user, 0.1% system, 0.0% nice, 99.4% idle
Mem: 124840K av, 109744K used, 15096K free, 0K shrd, 20072K
buff
Swap: 262040K av, 0K used, 262040K free 25592K
cached
At some point today I expect the box to trip and fall hard onto another
hundred-someodd nagios procs pushing the load higher and higher as it goes.
Its seems a circular problem that when nagios gets slightly overloaded it
doesn't recover and load gets higher, the next time its congested (this time
a smaller threshold) it does it again, until it really chokes out the
system. I'm concerned nagios isn't capable of handling such a
load...perhaps this should be pointed out to the developers?
-Dave
-----Original Message-----
From: Williams, P. Lane [mailto:Lane.Williams at jhuapl.edu]
Sent: Wednesday, June 18, 2003 10:51 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs
I see the same thing. But I think this is the way it should be. Nagios is
a parallel application. I typically run with an average of 50 - 70 nagios
procs a second and sometimes peaking at 300. System load typically runs
between 3 and 4, which for a typical server would be high. I have Sun
Enterprise application servers that run with a load average above 10 all day
and Sun Enterprise backup servers that run with a load of 6 or higher.
I suggest running "top" and watching whats going on. If your "iowait" is
0%, memory looks good, and the sleeping processes flucuate with on-going
processes, I'd say your running just fine. The fact that you have a high
load average may just mean you need a newer/faster server. I run with dual
xeons on a gig of ram, and sometimes peek my load average at 19. The only
problems I've noticed is with the default setting of "Sendmail" rejecting
request when load is above 12. I just reset those settings to 70 and all
looks good.
If your having problems with Nagios not completing checks in a timely
fashion, I recommend revisting your configuration. If you have a high
number of passive checks you may need to account for that as well.
Lane
-----Original Message-----
From: DTerrell at Delphi-Tech.com [mailto:DTerrell at Delphi-Tech.com]
Sent: Wednesday, June 18, 2003 10:10 AM
To: thomas.blidung at philips.com; nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] hundreds of procs
By the way, this is from one day of Nagios being active without me
restarting it:
10:08am up 29 days, 20:57, 1 user, load average: 11.06, 14.66, 15.69
1001 processes: 1000 sleeping, 1 running, 0 zombie, 0 stopped
-----Original Message-----
From: thomas.blidung at philips.com [mailto:thomas.blidung at philips.com]
Sent: Wednesday, June 18, 2003 2:53 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] hundreds of procs
Hi to everyone,
I just was reading the posting from Dave and his problem "nagios looping -
hundreds of procs"
It seems that I got the same problem. Up to now there is a relationship
between the frequency of checks an the occurance of many nagios-tasks. But
even if I set the normal_check_interval to al long period (10 minutes) it
happens, that after one or two
days there ar up to 200 or more nagios tasks.
Is this problem already solved I would like to get the solution.
regards
tom
Thomas Blidung
Philips Research Hamburg
Tel. 5078-2838
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list