Nagios and Gearman - huge environment performance problem
Max Schubert
maxs at webwizarddesign.com
Wed Aug 24 01:06:20 CEST 2011
On Tue, Aug 23, 2011 at 5:48 PM, Mark Goldfinch
<mark.goldfinch at modicagroup.com> wrote:
> On this particular point, the overall system CPU statistics displayed at the top of "top" are an average across all CPUs. As previously mooted, Nagios core isn't multi-threaded, so it can only max a single core. 100% of 1/8 CPUs == 12.5% hence why you're seeing 87.5% idle time, 7 of your cores are not stressed out.
Nagios forks a new process to execute each check - so it will take
advantage of multiple cores as long as the kernel scheduler is working
properly :p - on our biggest pollers we get 300-400 checks running at
a time in parallel at any given time during the polling cycle.
Some blog posts I wrote about Nagios performance that might help (some
of the topics have been covered):
http://www.semintelligent.com/blog/?q=Performance
We found that changing host and service inter-check delay to 'n' for
no delay made a big difference - also, changing sleep time to 0.02 and
compiling Nagios with nanosleep enabled helps a lot as well - and we
added a few additional patches to remove hard-coded sleep statements
that were in the code that were causing Nagios to sleep more than we
wanted.
Right now on an HP DL385 we max out at about 10k checks (combo of host
and service checks) per 5 minutes with a sustained service check
latency of 2-3 seconds - that is a quad core host with 8 GB of RAM.
We have latency requirements that are very specific to our environment
- we keep all pollers at less than 10 secs service latency at all
times.
- Max
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management
Up to 160% more powerful than alternatives and 25% more efficient.
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
More information about the Developers
mailing list