Nagios and Gearman - huge environment performance problem

Daniel Wittenberg daniel.wittenberg.r0ko at statefarm.com
Fri Aug 19 18:31:39 CEST 2011


Well but look at your bi and bo, and then the wa column.  So looks like you have some IO Wait which probably means it's waiting on disk activity to get things done, and lots of writing to disk.  Have you looked at adding a ramdisk for your checkresults, status.dat, and temp_file?  That should help eliminate most of the heavy disk i/o from the nagios perspective.  Since it doesn't look like you are swapping memory you should be able to throw some at a ramdisk.  You can probably start with 64MB and watch it, might have to go higher depending on your workload.

Dan

From: Rodney Ramos [mailto:rodneyra at gmail.com]
Sent: Friday, August 19, 2011 11:27 AM
To: Nagios Developers List
Subject: Re: [Nagios-devel] Nagios and Gearman - huge environment performance problem

Hi, Daniel,

As we can see below, I think it is not a hardware problem. The idle CPU is beteween 60 and 80 %, very good.

Thank you very much.


$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  2  22092 3046788 189640 890940    0    0   295  1053    0    0  4  3 83 10  0
 1  2  22092 3032992 189664 904600    0    0  2733  7550 3498 7477 12  1 69 18  0
 1  2  22092 3018240 189668 918632    0    0  2720  4070 2484 5114 13  1 72 15  0
 1  0  22092 3008312 189668 930336    0    0  2332  1534 1932 3825 13  1 73 14  0
 1 18  22092 2979292 189724 945780    0    0  1486 13974 2460 8446 16  2 72 10  0
 1  2  22092 2965244 189736 959228    0    0  2570  9094 3290 7204 13  1 67 19  0
 1  2  22092 2949064 189748 973100    0    0  2820  3040 2798 6639 13  2 68 17  0
 1  6  22092 2936060 189768 987788    0    0  2894  3620 2474 5443 13  1 70 16  0
 1  1  22092 2923320 189780 999708    0    0  2377  2618 2285 4794 13  1 70 16  0
 1  0  22092 2923428 189780 999964    0    0     0  4575 1732 2317 12  1 86  1  0
 1  9  22092 2912192 189784 1005260    0    0   402  4544 1541 3889 14  1 82  3  0
 1  7  22092 2891692 189808 1023020    0    0  2534 13969 3232 9421 14  2 66 17  0
 3  2  22092 2868908 189836 1037064    0    0  2797  4115 3002 7055 30  2 54 14  0
 2  2  22092 2860712 189860 1050376    0    0  2646  3352 2448 5416 16  1 67 17  0
 1  8  22092 2847052 189872 1064036    0    0  2748  3970 2616 5487 13  1 69 17  0
 1  0  22092 3469576 189876 462624    0    0   825  1245 1379 2098 12  1 83  5  0
 1  0  22092 3469248 189884 462720    0    0     4  2631 1552 2599 13  0 86  0  0
 1 20  22092 3449816 189904 482192    0    0  2404  8454 2293 7764 15  2 70 12  0
 1 17  22092 3434856 189912 495636    0    0  2694  8955 3542 8039 13  2 65 19  0
 2  7  22092 3422204 189932 509376    0    0  2742  4059 2685 5826 13  1 68 19  0
 1 13  22092 3407532 189948 522508    0    0  2661  3613 6447 49867 12  4 66 17  0
 0  0  22092 3404484 189968 525964    0    0   669  3338 5317 43602 10  4 81  6  0
 1  0  22092 3402004 189984 525956    0    0     0    14 3637 12700 13  1 85  0  0
 1  0  22092 3398172 190012 526036    0    0     0  3318 3972 12401 14  1 85  0  0
 2  0  22092 3392628 190028 526048    0    0     0  9331 5347 16423 15  3 81  1  0
 4  0  22092 3391704 190048 526060    0    0     0  4270 5785 18736 16  2 80  1  0
 1  1  22092 3391652 190064 526056    0    0     0  4091 4746 14669 16  2 82  1  0
 1  0  22092 3392104 190068 526056    0    0     0  1562 4037 11849 16  1 83  0  0
 3  0  22092 3392304 190084 526168    0    0     1  2532 4618 16418 15  2 83  0  0
 1  7  22092 3386028 190112 531488    0    0   967   363 4194 14941 15  2 77  6  0
On Fri, Aug 19, 2011 at 11:32 AM, Daniel Wittenberg <daniel.wittenberg.r0ko at statefarm.com<mailto:daniel.wittenberg.r0ko at statefarm.com>> wrote:
>
> One simple thing that might help is just run vmstat for a couple minutes:
>
>
>
> vmstat 5
>
>
>
> That can help show if you are hitting some bottlenecks.  Are you using a lot of macros in your configs?
>
>
>
> Dan
>
>
>
> From: Rodney Ramos [mailto:rodneyra at gmail.com<mailto:rodneyra at gmail.com>]
> Sent: Friday, August 19, 2011 9:30 AM
> To: Nagios Developers List
> Subject: [Nagios-devel] Nagios and Gearman - huge environment performance problem
>
>
>
> Hi everybody,
>
> I´m testing Nagios and Gearman / Mod_Gearman. I´d like to change NSCA with this new approach, as it seems easier to configure and has a lot of advantages. Besides, NSCA and Nagios freshness mechanism have some problems.
>
> Gearman and mod_gearman are working well. I have 30000 hosts and 60000 services, and it is increasing!
>
> Now I´m having problem with Nagios performance, that eats 100% of CPU and the host and service latency is very big, around 300 seconds. I think that this a Nagios problem, as the gearman_top shows the Job Wainting queue empty almost all the time. It seems that Nagios do not send the active checks all the time, an once in while it sends a burst of active checks.
>
> I have a physical central server, running RHEL, with 4 GB of ram, Intel(R) Xeon(R) CPU E5504  @ 2.00GHz (8 CPUs). For the workers I have 9 virtual servers running RHEL too.
>
> I've already set the Nagios parameters to large environment, as recommended in the documentation, but it made no difference. Thanks.
>
> Nagios Parameters to large environment:
>
> - use_large_installation_tweaks=1
>
> - enable_environment_macros=0
>
> - max_concurrent_checks=0
>
> - check_result_reaper_frequency=10
>
> Could someone help me? How can I improve Nagios performance to make active checks faster?
>
> Thank you very much.
>
>
> ------------------------------------------------------------------------------
> Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
> user administration capabilities and model configuration. Take
> the hassle out of deploying and managing Subversion and the
> tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net<mailto:Nagios-devel at lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20110819/00c1f5ca/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list