bizarre Nagios 2.12 memory leak
Giorgio Zarrelli
zarrelli at linux.it
Thu Apr 15 17:43:22 CEST 2010
Did you check zombie procs growth an iowait om cpu?
Ciao,
Giorgio
Il giorno 15/apr/2010, alle ore 17.24, Jeremy <s6a9d6u9s at gmail.com> ha
scritto:
> We have a large distributed setup running Nagios 2.12 with 20
> distributed servers sharing about 20000 checks against 2500 hosts.
> They are reporting into multiple master Nagios servers using a
> modified OCP_daemon that handles multiple master servers. Recently
> we nearly doubled our number of distributed servers. Our number of
> checks had grown so we only were doing about 20-30% per minute on
> some of our most busy distributed servers. Now we are doing 90% per
> minute.
>
> Ever since we increased the frequency of all the checks, our oldest
> Master server has started crashing randomly every so often. Nothing
> else has changed. Memory use goes through the roof until eventually
> there is 0 swap left and the server finally crashes and has to be
> rebooted. If we restart the Nagios service while the memory usage is
> going crazy, it drops back down to normal for quite a while, but
> days later it will happen again. I started restarting Nagios on that
> server once an hour but it hasn't helped. We tried upgrading to 16
> GB of RAM which has made this happen a bit less often, but it
> continues to happen sometimes.
>
> We are using NPCD to graph the performance data from all of our
> checks, but all the graph .RRD files are on a dedicated partition,
> and the crashing happens even when we disable graphing completely
> and disk I/O is near 0% on both the system partitions and the graph
> partition.
>
> So I was wondering how I could go about figuring out why Nagios is
> freaking out on our older server (Dell PowerEdge 1950). Our other
> Master server (a Dell PowerEdge R710) gets all the same checks
> reported to it, and handles it just fine, but it using much newer
> Xeon CPUs, faster memory, etc. The old crashing server handles
> things just fine for days at a time until it randomly runs itself
> out of swap space and crashes.
>
> I know I really should get around to upgrading to Nagios 3.x but no
> time for that yet and it's going to be a pain to upgrade them all at
> once without being blind for a little bit, so pretend Nagios 3.x
> isn't an option just yet.
>
> Thanks for any insight!
> Jeremy
> ---
> ---
> ---
> ---------------------------------------------------------------------
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list