try lowering max_check_result_reaper value.... I had good luck playing with that value. Thanks<br><br><div class="gmail_quote">On Tue, May 4, 2010 at 8:13 PM, Trisha Hoang <span dir="ltr"><<a href="mailto:trisha@rockyou.com">trisha@rockyou.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Hi,<br>The nagios <b>master </b>got really high host latency and I'm not sure how to tweak it. I ran the check_ping plugin on a handful of hosts and the rta averaged at 0.2 second so it's not the network. <br>
<br><u>Environment:</u><br>
- 565 hosts<br>- 6790 passive checks from the slaves<br>- not using event broker<br>- master server <b>actively</b> executes the hosts checks every 5 minutes and <b>passively </b>processes checks every 1 minute<br>- not doing performance data<br>
<br><u>Nagiostats</u><br><br>Nagios Stats 3.2.1<br>Copyright (c) 2003-2008 Ethan Galstad (<a href="http://www.nagios.org" target="_blank">www.nagios.org</a>)<br>Last Modified: 03-09-2010<br>License: GPL<br><br>CURRENT STATUS DATA<br>
------------------------------------------------------<br>
Status File: /var/log/nagios/status.dat<br>Status File Age: 0d 0h 0m 23s<br>Status File Version: 3.2.1<br><br>Program Running Time: 0d 1h 32m 19s<br>
Nagios PID: 28282<br>Used/High/Total Command Buffers: 1316 / 3066 / 4096<br><br>Total Services: 7745<br>Services Checked: 7745<br>Services Scheduled: 1381<br>
Services Actively Checked: 955<br>Services Passively Checked: 6790<br>Total Service State Change: 0.000 / 9.740 / 0.007 %<br>Active Service Latency: 18.948 / 205.144 / 165.751 sec<br>
Active Service Execution Time: 0.007 / 9.051 / 0.055 sec<br>Active Service State Change: 0.000 / 5.460 / 0.006 %<br>Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0<br>Passive Service Latency: 34.359 / 190.247 / 76.739 sec<br>
Passive Service State Change: 0.000 / 9.740 / 0.008 %<br>Passive Services Last 1/5/15/60 min: 0 / 3054 / 6774 / 6784<br>Services Ok/Warn/Unk/Crit: 7720 / 1 / 0 / 24<br>Services Flapping: 27<br>
Services In Downtime: 0<br><br>Total Hosts: 566<br>Hosts Checked: 566<br>Hosts Scheduled: 566<br>Hosts Actively Checked: 566<br>
Host Passively Checked: 0<br>Total Host State Change: 0.000 / 0.000 / 0.000 %<br>Active Host Latency: 0.000 / 3410.087 / 2413.051 sec<br>Active Host Execution Time: 0.007 / 10.010 / 0.063 sec<br>
Active Host State Change: 0.000 / 0.000 / 0.000 %<br>Active Hosts Last 1/5/15/60 min: 0 / 8 / 10 / 565<br>Passive Host Latency: 0.000 / 0.000 / 0.000 sec<br>Passive Host State Change: 0.000 / 0.000 / 0.000 %<br>
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0<br>Hosts Up/Down/Unreach: 563 / 3 / 0<br>Hosts Flapping: 1<br>Hosts In Downtime: 0<br><br>Active Host Checks Last 1/5/15 min: 5 / 32 / 75<br>
Scheduled: 0 / 0 / 0<br> On-demand: 5 / 32 / 75<br> Parallel: 1 / 11 / 23<br> Serial: 0 / 0 / 0<br> Cached: 4 / 21 / 52<br>
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0<br>Active Service Checks Last 1/5/15 min: 0 / 0 / 0<br> Scheduled: 0 / 0 / 0<br> On-demand: 0 / 0 / 0<br> Cached: 0 / 0 / 0<br>
Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455<br><br>External Commands Last 1/5/15 min: 1302 / 6063 / 20253<br><br><br><u>Nagios.cfg</u><br><br># EXTERNAL COMMAND CHECK INTERVAL<br># This is the interval at which Nagios should check for external commands.<br>
# This value works of the interval_length you specify later. If you leave<br># that at its default value of 60 (seconds), a value of 1 here will cause<br># Nagios to check for external commands every minute. If you specify a<br>
# number followed by an "s" (i.e. 15s), this will be interpreted to mean<br># actual seconds rather than a multiple of the interval_length variable.<br># Note: In addition to reading the external command file at regularly<br>
# scheduled intervals, Nagios will also check for external commands after<br># event handlers are executed.<br># NOTE: Setting this value to -1 causes Nagios to check the external<br># command file as often as possible.<br>
<br>#command_check_interval=15s<br>command_check_interval=-1<br><br># SERVICE INTER-CHECK DELAY METHOD<br># This is the method that Nagios should use when initially<br># "spreading out" service checks when it starts monitoring. The<br>
# default is to use smart delay calculation, which will try to<br># space all service checks out evenly to minimize CPU load.<br># Using the dumb setting will cause all checks to be scheduled<br># at the same time (with no delay between them)! This is not a<br>
# good thing for production, but is useful when testing the<br># parallelization functionality.<br># n = None - don't use any delay between checks<br># d = Use a "dumb" delay of 1 second between checks<br>
# s = Use "smart" inter-check delay calculation<br># x.xx = Use an inter-check delay of x.xx seconds<br><br>service_inter_check_delay_method=s<br><br># MAXIMUM SERVICE CHECK SPREAD<br># This variable determines the timeframe (in minutes) from the<br>
# program start time that an initial check of all services should<br># be completed. Default is 30 minutes.<br><br>max_service_check_spread=30<br><br># SERVICE CHECK INTERLEAVE FACTOR<br># This variable determines how service checks are interleaved.<br>
# Interleaving the service checks allows for a more even<br># distribution of service checks and reduced load on remote<br># hosts. Setting this value to 1 is equivalent to how versions<br># of Nagios previous to 0.0.5 did service checks. Set this<br>
# value to s (smart) for automatic calculation of the interleave<br># factor unless you have a specific reason to change it.<br># s = Use "smart" interleave factor calculation<br># x = Use an interleave factor of x, where x is a<br>
# number greater than or equal to 1.<br><br>service_interleave_factor=s<br><br># HOST INTER-CHECK DELAY METHOD<br># This is the method that Nagios should use when initially<br># "spreading out" host checks when it starts monitoring. The<br>
# default is to use smart delay calculation, which will try to<br># space all host checks out evenly to minimize CPU load.<br># Using the dumb setting will cause all checks to be scheduled<br># at the same time (with no delay between them)!<br>
# n = None - don't use any delay between checks<br># d = Use a "dumb" delay of 1 second between checks<br># s = Use "smart" inter-check delay calculation<br># x.xx = Use an inter-check delay of x.xx seconds<br>
<br>host_inter_check_delay_method=s<br><br><br># MAXIMUM HOST CHECK SPREAD<br># This variable determines the timeframe (in minutes) from the<br># program start time that an initial check of all hosts should<br># be completed. Default is 30 minutes.<br>
<br>max_host_check_spread=30<br><br><br># MAXIMUM CONCURRENT SERVICE CHECKS<br># This option allows you to specify the maximum number of<br># service checks that can be run in parallel at any given time.<br># Specifying a value of 1 for this variable essentially prevents<br>
# any service checks from being parallelized. A value of 0<br># will not restrict the number of concurrent checks that are<br># being executed.<br><br>max_concurrent_checks=0<br><br><br># HOST AND SERVICE CHECK REAPER FREQUENCY<br>
# This is the frequency (in seconds!) that Nagios will process<br># the results of host and service checks.<br><br>check_result_reaper_frequency=10<br><br># MAX CHECK RESULT REAPER TIME<br># This is the max amount of time (in seconds) that a single<br>
# check result reaper event will be allowed to run before<br># returning control back to Nagios so it can perform other<br># duties.<br><br>max_check_result_reaper_time=30<br><br><br># CHECK RESULT PATH<br># This is directory where Nagios stores the results of host and<br>
# service checks that have not yet been processed.<br>#<br># Note: Make sure that only one instance of Nagios has access<br># to this directory!<br><br>check_result_path=/var/log/nagios/spool/checkresults<br><br><br># MAX CHECK RESULT FILE AGE<br>
# This option determines the maximum age (in seconds) which check<br># result files are considered to be valid. Files older than this<br># threshold will be mercilessly deleted without further processing.<br><br>max_check_result_file_age=3600<br>
<br><br># CACHED HOST CHECK HORIZON<br># This option determines the maximum amount of time (in seconds)<br># that the state of a previous host check is considered current.<br># Cached host states (from host checks that were performed more<br>
# recently that the timeframe specified by this value) can immensely<br># improve performance in regards to the host check logic.<br># Too high of a value for this option may result in inaccurate host<br># states being used by Nagios, while a lower value may result in a<br>
# performance hit for host checks. Use a value of 0 to disable host<br># check caching.<br><br>#cached_host_check_horizon=15<br>cached_host_check_horizon=60<br><br># CACHED SERVICE CHECK HORIZON<br># This option determines the maximum amount of time (in seconds)<br>
# that the state of a previous service check is considered current.<br># Cached service states (from service checks that were performed more<br># recently that the timeframe specified by this value) can immensely<br># improve performance in regards to predictive dependency checks.<br>
# Use a value of 0 to disable service check caching.<br><br>cached_service_check_horizon=15<br><br><br><br># ENABLE PREDICTIVE HOST DEPENDENCY CHECKS<br># This option determines whether or not Nagios will attempt to execute<br>
# checks of hosts when it predicts that future dependency logic test<br># may be needed. These predictive checks can help ensure that your<br># host dependency logic works well.<br># Values:<br># 0 = Disable predictive checks<br>
# 1 = Enable predictive checks (default)<br><br>enable_predictive_host_dependency_checks=1<br><br><br><br># ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS<br># This option determines whether or not Nagios will attempt to execute<br>
# checks of service when it predicts that future dependency logic test<br># may be needed. These predictive checks can help ensure that your<br># service dependency logic works well.<br># Values:<br># 0 = Disable predictive checks<br>
# 1 = Enable predictive checks (default)<br><br>enable_predictive_service_dependency_checks=1<br><br># AUTO-RESCHEDULING OPTION<br># This option determines whether or not Nagios will attempt to<br># automatically reschedule active host and service checks to<br>
# "smooth" them out over time. This can help balance the load on<br># the monitoring server.<br># WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE<br># PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY<br>
<br>auto_reschedule_checks=0<br><br><br><br># AUTO-RESCHEDULING INTERVAL<br># This option determines how often (in seconds) Nagios will<br># attempt to automatically reschedule checks. This option only<br># has an effect if the auto_reschedule_checks option is enabled.<br>
# Default is 30 seconds.<br># WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE<br># PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY<br><br>auto_rescheduling_interval=30<br><br><br><br># AUTO-RESCHEDULING WINDOW<br>
# This option determines the "window" of time (in seconds) that<br># Nagios will look at when automatically rescheduling checks.<br># Only host and service checks that occur in the next X seconds<br># (determined by this variable) will be rescheduled. This option<br>
# only has an effect if the auto_reschedule_checks option is<br># enabled. Default is 180 seconds (3 minutes).<br># WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE<br># PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY<br>
<br>auto_rescheduling_window=180<br><br><br><br># SLEEP TIME<br># This is the number of seconds to sleep between checking for system<br># events and service checks that need to be run.<br><br>sleep_time=0.25<br><br># TIMEOUT VALUES<br>
# These options control how much time Nagios will allow various<br># types of commands to execute before killing them off. Options<br># are available for controlling maximum time allotted for<br># service checks, host checks, event handlers, notifications, the<br>
# ocsp command, and performance data commands. All values are in<br># seconds.<br><br>service_check_timeout=60<br>host_check_timeout=30<br>event_handler_timeout=30<br>notification_timeout=30<br>ocsp_timeout=5<br>perfdata_timeout=5<br>
<br># AGGRESSIVE HOST CHECKING OPTION<br># If you don't want to turn on aggressive host checking features, set<br># this value to 0 (the default). Otherwise set this value to 1 to<br># enable the aggressive check option. Read the docs for more info<br>
# on what aggressive host check is or check out the source code in<br># base/checks.c<br><br>use_aggressive_host_checking=0<br><br><br><br># SERVICE CHECK EXECUTION OPTION<br># This determines whether or not Nagios will actively execute<br>
# service checks when it initially starts. If this option is<br># disabled, checks are not actively made, but Nagios can still<br># receive and process passive check results that come in. Unless<br># you're implementing redundant hosts or have a special need for<br>
# disabling the execution of service checks, leave this enabled!<br># Values: 1 = enable checks, 0 = disable checks<br><br>execute_service_checks=0<br><br><br><br># PASSIVE SERVICE CHECK ACCEPTANCE OPTION<br># This determines whether or not Nagios will accept passive<br>
# service checks results when it initially (re)starts.<br># Values: 1 = accept passive checks, 0 = reject passive checks<br><br>accept_passive_service_checks=1<br><br><br><br># HOST CHECK EXECUTION OPTION<br># This determines whether or not Nagios will actively execute<br>
# host checks when it initially starts. If this option is<br># disabled, checks are not actively made, but Nagios can still<br># receive and process passive check results that come in. Unless<br># you're implementing redundant hosts or have a special need for<br>
# disabling the execution of host checks, leave this enabled!<br># Values: 1 = enable checks, 0 = disable checks<br><br>execute_host_checks=1<br><br># PASSIVE HOST CHECK ACCEPTANCE OPTION<br># This determines whether or not Nagios will accept passive<br>
# host checks results when it initially (re)starts.<br># Values: 1 = accept passive checks, 0 = reject passive checks<br><br>accept_passive_host_checks=0<br><br># OBSESS OVER SERVICE CHECKS OPTION<br># This determines whether or not Nagios will obsess over service<br>
# checks and run the ocsp_command defined below. Unless you're<br># planning on implementing distributed monitoring, do not enable<br># this option. Read the HTML docs for more information on<br># implementing distributed monitoring.<br>
# Values: 1 = obsess over services, 0 = do not obsess (default)<br><br>obsess_over_services=0<br><br><br><br># OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND<br># This is the command that is run for every service check that is<br>
# processed by Nagios. This command is executed only if the<br># obsess_over_services option (above) is set to 1. The command<br># argument is the short name of a command definition that you<br># define in your host configuration file. Read the HTML docs for<br>
# more information on implementing distributed monitoring.<br><br>#ocsp_command=somecommand<br><br><br><br># OBSESS OVER HOST CHECKS OPTION<br># This determines whether or not Nagios will obsess over host<br># checks and run the ochp_command defined below. Unless you're<br>
# planning on implementing distributed monitoring, do not enable<br># this option. Read the HTML docs for more information on<br># implementing distributed monitoring.<br># Values: 1 = obsess over hosts, 0 = do not obsess (default)<br>
<br>obsess_over_hosts=0<br><br><br><br># OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND<br># This is the command that is run for every host check that is<br># processed by Nagios. This command is executed only if the<br># obsess_over_hosts option (above) is set to 1. The command<br>
# argument is the short name of a command definition that you<br># define in your host configuration file. Read the HTML docs for<br># more information on implementing distributed monitoring.<br><br>#ochp_command=somecommand<br>
<br># SERVICE FRESHNESS CHECK OPTION<br># This option determines whether or not Nagios will periodically<br># check the "freshness" of service results. Enabling this option<br># is useful for ensuring passive checks are received in a timely<br>
# manner.<br># Values: 1 = enabled freshness checking, 0 = disable freshness checking<br><br>check_service_freshness=1<br><br><br><br># SERVICE FRESHNESS CHECK INTERVAL<br># This setting determines how often (in seconds) Nagios will<br>
# check the "freshness" of service check results. If you have<br># disabled service freshness checking, this option has no effect.<br><br>#service_freshness_check_interval=60<br>service_freshness_check_interval=420<br>
<br><br><br># HOST FRESHNESS CHECK OPTION<br># This option determines whether or not Nagios will periodically<br># check the "freshness" of host results. Enabling this option<br># is useful for ensuring passive checks are received in a timely<br>
# manner.<br># Values: 1 = enabled freshness checking, 0 = disable freshness checking<br><br>check_host_freshness=0<br>#check_host_freshness=1<br><br><br><br># HOST FRESHNESS CHECK INTERVAL<br># This setting determines how often (in seconds) Nagios will<br>
# check the "freshness" of host check results. If you have<br># disabled host freshness checking, this option has no effect.<br><br>#host_freshness_check_interval=60<br>host_freshness_check_interval=420<br><br>
# ADDITIONAL FRESHNESS THRESHOLD LATENCY<br># This setting determines the number of seconds that Nagios<br># will add to any host and service freshness thresholds that<br># it calculates (those not explicitly specified by the user).<br>
<br>#additional_freshness_latency=15<br>additional_freshness_latency=180<br><br><br># LARGE INSTALLATION TWEAKS OPTION<br># This option determines whether or not Nagios will take some shortcuts<br># which can save on memory and CPU usage in large Nagios installations.<br>
# Read the documentation for more information on the benefits/tradeoffs<br># of enabling this option.<br># Values: 1 - Enabled tweaks<br># 0 - Disable tweaks (default)<br><br>use_large_installation_tweaks=1<br><br>
<br># CHILD PROCESS MEMORY OPTION<br># This option determines whether or not Nagios will free memory in<br># child processes (processed used to execute system commands and host/<br># service checks). If you specify a value here, it will override<br>
# program defaults.<br># Value: 1 - Free memory in child processes<br># 0 - Do not free memory in child processes<br><br>#free_child_process_memory=1<br><br># CHILD PROCESS FORKING BEHAVIOR<br># This option determines how Nagios will fork child processes<br>
# (used to execute system commands and host/service checks). Normally<br># child processes are fork()ed twice, which provides a very high level<br># of isolation from problems. Fork()ing once is probably enough and will<br>
# save a great deal on CPU usage (in large installs), so you might<br># want to consider using this. If you specify a value here, it will<br># program defaults.<br># Value: 1 - Child processes fork() twice<br># 0 - Child processes fork() just once<br>
<br>#child_processes_fork_twice=1<br>child_processes_fork_twice=0<br><br><br># DEBUG LEVEL<br># This option determines how much (if any) debugging information will<br># be written to the debug file. OR values together to log multiple<br>
# types of information.<br># Values:<br># -1 = Everything<br># 0 = Nothing<br># 1 = Functions<br># 2 = Configuration<br># 4 = Process information<br># 8 = Scheduled events<br>
# 16 = Host/service checks<br># 32 = Notifications<br># 64 = Event broker<br># 128 = External commands<br># 256 = Commands<br># 512 = Scheduled downtime<br># 1024 = Comments<br>
# 2048 = Macros<br><br>debug_level=16<br><br><br># DEBUG VERBOSITY<br># This option determines how verbose the debug log out will be.<br># Values: 0 = Brief output<br># 1 = More detailed<br># 2 = Very detailed<br>
<br>debug_verbosity=1<br><br>Thanks in advance for your help.<br><font color="#888888">Trisha<br>
</font><br>------------------------------------------------------------------------------<br>
<br>_______________________________________________<br>
Nagios-users mailing list<br>
<a href="mailto:Nagios-users@lists.sourceforge.net">Nagios-users@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/nagios-users" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-users</a><br>
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.<br>
::: Messages without supporting info will risk being sent to /dev/null<br></blockquote></div><br><br clear="all"><br>-- <br>Cordially,<br>Shadhin Rahman<br>