Increase the efficiency of passive checks

Andrew_Hoying at blm.gov Andrew_Hoying at blm.gov
Fri Apr 9 21:32:46 CEST 2004





Interestingly enough I started with the setting of -1 but for no reason I
could figure out occasionally it would switch from checking as fast as it
could to checking once a minute, which really caused problems. I'll look at
increasing status_update_interval to 30 seconds and see if I see a
difference. Other than that, I think I have everything tuned for maximum
performance. I am using mysql for everything but extended information which
I assume would increase Nagios's speed, but could be the code that is
causing the bottleneck.

Thanks for the advice,
Andrew



                                                                           
             "Marc Powell"                                                 
             <marc at ena.com>                                                
                                                                        To 
             04/09/2004 01:28          <Andrew_Hoying at blm.gov>,            
             PM                        <nagios-users at lists.sourceforge.net 
                                       >                                   
                                                                        cc 
                                                                           
                                                                   Subject 
                                       RE: [Nagios-users] Increase the     
                                       efficiency of passive checks        
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Andrew_Hoying at blm.gov <mailto:Andrew_Hoying at blm.gov> wrote:
> Hello,
>
> I'm currently running a Nagios server which is processing a little
> over 2000 passive checks every 10 minutes. I know the server has the
> memory and processing power to handle that many checks in half that
> time, however the bottle neck seems to be the size of the named pipe
> file and the speed at which it is read. I have Nagios set to check
> the file every second, which works fine, and it processes around 5
> passive checks a second, however the server Nagios is running on is
> still only using around 4% of it's processing power and it's disk
> access is nominal. What can I do to increase the size of the named
> pipe, or move to shared memory, a Unix socket, or some other method
> of accepting passive checks that would speed it up without
> significantly rewriting Nagios? Does anyone have a patch for 1.2 that
> would solve this problem? Is 2.0 significantly better in this regard?


FWIW, I'm receiving ~2600 passive checks every 5 minutes and Nagios
(1.1) is able to keep up just fine. The only modification I've made is
to set command_check_interval=-1 which tells Nagios to check the command
file as often as possible, not just every second. The command file is 0
length about 30-40% of the time and very rarely maxed out so I believe
that I can scale to at least double that number of checks and probably
more with no changes.

Also possibly related are the status update intervals and retention
values (as they will take time away from nagios processing external
commands). Mine are set as follows --

aggregate_status_updates=1
status_update_interval=15 (this affects GUI 'freshness')
retain_state_information=1
retention_update_interval=5

The _interval times above could probably be increased to squeeze out
more performance and should be set to whatever your expections for
freshness are. I haven't done any benchmarking to determine just how
much of an effect different values have but my feeling from using
Nagios/Netsaint for a couple of years is that they do have a measureable
effect, especially if they are very low and you have a high number of
devices/services you are monitoring.

--
Marc





-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list