high host latency on nagios master
Trisha Hoang
trisha at rockyou.com
Wed May 5 02:13:45 CEST 2010
Hi,
The nagios *master *got really high host latency and I'm not sure how to
tweak it. I ran the check_ping plugin on a handful of hosts and the rta
averaged at 0.2 second so it's not the network.
*Environment:*
- 565 hosts
- 6790 passive checks from the slaves
- not using event broker
- master server *actively* executes the hosts checks every 5 minutes
and *passively
*processes checks every 1 minute
- not doing performance data
*Nagiostats*
Nagios Stats 3.2.1
Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
Last Modified: 03-09-2010
License: GPL
CURRENT STATUS DATA
------------------------------------------------------
Status File: /var/log/nagios/status.dat
Status File Age: 0d 0h 0m 23s
Status File Version: 3.2.1
Program Running Time: 0d 1h 32m 19s
Nagios PID: 28282
Used/High/Total Command Buffers: 1316 / 3066 / 4096
Total Services: 7745
Services Checked: 7745
Services Scheduled: 1381
Services Actively Checked: 955
Services Passively Checked: 6790
Total Service State Change: 0.000 / 9.740 / 0.007 %
Active Service Latency: 18.948 / 205.144 / 165.751 sec
Active Service Execution Time: 0.007 / 9.051 / 0.055 sec
Active Service State Change: 0.000 / 5.460 / 0.006 %
Active Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
Passive Service Latency: 34.359 / 190.247 / 76.739 sec
Passive Service State Change: 0.000 / 9.740 / 0.008 %
Passive Services Last 1/5/15/60 min: 0 / 3054 / 6774 / 6784
Services Ok/Warn/Unk/Crit: 7720 / 1 / 0 / 24
Services Flapping: 27
Services In Downtime: 0
Total Hosts: 566
Hosts Checked: 566
Hosts Scheduled: 566
Hosts Actively Checked: 566
Host Passively Checked: 0
Total Host State Change: 0.000 / 0.000 / 0.000 %
Active Host Latency: 0.000 / 3410.087 / 2413.051 sec
Active Host Execution Time: 0.007 / 10.010 / 0.063 sec
Active Host State Change: 0.000 / 0.000 / 0.000 %
Active Hosts Last 1/5/15/60 min: 0 / 8 / 10 / 565
Passive Host Latency: 0.000 / 0.000 / 0.000 sec
Passive Host State Change: 0.000 / 0.000 / 0.000 %
Passive Hosts Last 1/5/15/60 min: 0 / 0 / 0 / 0
Hosts Up/Down/Unreach: 563 / 3 / 0
Hosts Flapping: 1
Hosts In Downtime: 0
Active Host Checks Last 1/5/15 min: 5 / 32 / 75
Scheduled: 0 / 0 / 0
On-demand: 5 / 32 / 75
Parallel: 1 / 11 / 23
Serial: 0 / 0 / 0
Cached: 4 / 21 / 52
Passive Host Checks Last 1/5/15 min: 0 / 0 / 0
Active Service Checks Last 1/5/15 min: 0 / 0 / 0
Scheduled: 0 / 0 / 0
On-demand: 0 / 0 / 0
Cached: 0 / 0 / 0
Passive Service Checks Last 1/5/15 min: 2 / 1455 / 1455
External Commands Last 1/5/15 min: 1302 / 6063 / 20253
*Nagios.cfg*
# EXTERNAL COMMAND CHECK INTERVAL
# This is the interval at which Nagios should check for external commands.
# This value works of the interval_length you specify later. If you leave
# that at its default value of 60 (seconds), a value of 1 here will cause
# Nagios to check for external commands every minute. If you specify a
# number followed by an "s" (i.e. 15s), this will be interpreted to mean
# actual seconds rather than a multiple of the interval_length variable.
# Note: In addition to reading the external command file at regularly
# scheduled intervals, Nagios will also check for external commands after
# event handlers are executed.
# NOTE: Setting this value to -1 causes Nagios to check the external
# command file as often as possible.
#command_check_interval=15s
command_check_interval=-1
# SERVICE INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# "spreading out" service checks when it starts monitoring. The
# default is to use smart delay calculation, which will try to
# space all service checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)! This is not a
# good thing for production, but is useful when testing the
# parallelization functionality.
# n = None - don't use any delay between checks
# d = Use a "dumb" delay of 1 second between checks
# s = Use "smart" inter-check delay calculation
# x.xx = Use an inter-check delay of x.xx seconds
service_inter_check_delay_method=s
# MAXIMUM SERVICE CHECK SPREAD
# This variable determines the timeframe (in minutes) from the
# program start time that an initial check of all services should
# be completed. Default is 30 minutes.
max_service_check_spread=30
# SERVICE CHECK INTERLEAVE FACTOR
# This variable determines how service checks are interleaved.
# Interleaving the service checks allows for a more even
# distribution of service checks and reduced load on remote
# hosts. Setting this value to 1 is equivalent to how versions
# of Nagios previous to 0.0.5 did service checks. Set this
# value to s (smart) for automatic calculation of the interleave
# factor unless you have a specific reason to change it.
# s = Use "smart" interleave factor calculation
# x = Use an interleave factor of x, where x is a
# number greater than or equal to 1.
service_interleave_factor=s
# HOST INTER-CHECK DELAY METHOD
# This is the method that Nagios should use when initially
# "spreading out" host checks when it starts monitoring. The
# default is to use smart delay calculation, which will try to
# space all host checks out evenly to minimize CPU load.
# Using the dumb setting will cause all checks to be scheduled
# at the same time (with no delay between them)!
# n = None - don't use any delay between checks
# d = Use a "dumb" delay of 1 second between checks
# s = Use "smart" inter-check delay calculation
# x.xx = Use an inter-check delay of x.xx seconds
host_inter_check_delay_method=s
# MAXIMUM HOST CHECK SPREAD
# This variable determines the timeframe (in minutes) from the
# program start time that an initial check of all hosts should
# be completed. Default is 30 minutes.
max_host_check_spread=30
# MAXIMUM CONCURRENT SERVICE CHECKS
# This option allows you to specify the maximum number of
# service checks that can be run in parallel at any given time.
# Specifying a value of 1 for this variable essentially prevents
# any service checks from being parallelized. A value of 0
# will not restrict the number of concurrent checks that are
# being executed.
max_concurrent_checks=0
# HOST AND SERVICE CHECK REAPER FREQUENCY
# This is the frequency (in seconds!) that Nagios will process
# the results of host and service checks.
check_result_reaper_frequency=10
# MAX CHECK RESULT REAPER TIME
# This is the max amount of time (in seconds) that a single
# check result reaper event will be allowed to run before
# returning control back to Nagios so it can perform other
# duties.
max_check_result_reaper_time=30
# CHECK RESULT PATH
# This is directory where Nagios stores the results of host and
# service checks that have not yet been processed.
#
# Note: Make sure that only one instance of Nagios has access
# to this directory!
check_result_path=/var/log/nagios/spool/checkresults
# MAX CHECK RESULT FILE AGE
# This option determines the maximum age (in seconds) which check
# result files are considered to be valid. Files older than this
# threshold will be mercilessly deleted without further processing.
max_check_result_file_age=3600
# CACHED HOST CHECK HORIZON
# This option determines the maximum amount of time (in seconds)
# that the state of a previous host check is considered current.
# Cached host states (from host checks that were performed more
# recently that the timeframe specified by this value) can immensely
# improve performance in regards to the host check logic.
# Too high of a value for this option may result in inaccurate host
# states being used by Nagios, while a lower value may result in a
# performance hit for host checks. Use a value of 0 to disable host
# check caching.
#cached_host_check_horizon=15
cached_host_check_horizon=60
# CACHED SERVICE CHECK HORIZON
# This option determines the maximum amount of time (in seconds)
# that the state of a previous service check is considered current.
# Cached service states (from service checks that were performed more
# recently that the timeframe specified by this value) can immensely
# improve performance in regards to predictive dependency checks.
# Use a value of 0 to disable service check caching.
cached_service_check_horizon=15
# ENABLE PREDICTIVE HOST DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of hosts when it predicts that future dependency logic test
# may be needed. These predictive checks can help ensure that your
# host dependency logic works well.
# Values:
# 0 = Disable predictive checks
# 1 = Enable predictive checks (default)
enable_predictive_host_dependency_checks=1
# ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS
# This option determines whether or not Nagios will attempt to execute
# checks of service when it predicts that future dependency logic test
# may be needed. These predictive checks can help ensure that your
# service dependency logic works well.
# Values:
# 0 = Disable predictive checks
# 1 = Enable predictive checks (default)
enable_predictive_service_dependency_checks=1
# AUTO-RESCHEDULING OPTION
# This option determines whether or not Nagios will attempt to
# automatically reschedule active host and service checks to
# "smooth" them out over time. This can help balance the load on
# the monitoring server.
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
auto_reschedule_checks=0
# AUTO-RESCHEDULING INTERVAL
# This option determines how often (in seconds) Nagios will
# attempt to automatically reschedule checks. This option only
# has an effect if the auto_reschedule_checks option is enabled.
# Default is 30 seconds.
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
auto_rescheduling_interval=30
# AUTO-RESCHEDULING WINDOW
# This option determines the "window" of time (in seconds) that
# Nagios will look at when automatically rescheduling checks.
# Only host and service checks that occur in the next X seconds
# (determined by this variable) will be rescheduled. This option
# only has an effect if the auto_reschedule_checks option is
# enabled. Default is 180 seconds (3 minutes).
# WARNING: THIS IS AN EXPERIMENTAL FEATURE - IT CAN DEGRADE
# PERFORMANCE, RATHER THAN INCREASE IT, IF USED IMPROPERLY
auto_rescheduling_window=180
# SLEEP TIME
# This is the number of seconds to sleep between checking for system
# events and service checks that need to be run.
sleep_time=0.25
# TIMEOUT VALUES
# These options control how much time Nagios will allow various
# types of commands to execute before killing them off. Options
# are available for controlling maximum time allotted for
# service checks, host checks, event handlers, notifications, the
# ocsp command, and performance data commands. All values are in
# seconds.
service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
# AGGRESSIVE HOST CHECKING OPTION
# If you don't want to turn on aggressive host checking features, set
# this value to 0 (the default). Otherwise set this value to 1 to
# enable the aggressive check option. Read the docs for more info
# on what aggressive host check is or check out the source code in
# base/checks.c
use_aggressive_host_checking=0
# SERVICE CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# service checks when it initially starts. If this option is
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in. Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of service checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks
execute_service_checks=0
# PASSIVE SERVICE CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# service checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks
accept_passive_service_checks=1
# HOST CHECK EXECUTION OPTION
# This determines whether or not Nagios will actively execute
# host checks when it initially starts. If this option is
# disabled, checks are not actively made, but Nagios can still
# receive and process passive check results that come in. Unless
# you're implementing redundant hosts or have a special need for
# disabling the execution of host checks, leave this enabled!
# Values: 1 = enable checks, 0 = disable checks
execute_host_checks=1
# PASSIVE HOST CHECK ACCEPTANCE OPTION
# This determines whether or not Nagios will accept passive
# host checks results when it initially (re)starts.
# Values: 1 = accept passive checks, 0 = reject passive checks
accept_passive_host_checks=0
# OBSESS OVER SERVICE CHECKS OPTION
# This determines whether or not Nagios will obsess over service
# checks and run the ocsp_command defined below. Unless you're
# planning on implementing distributed monitoring, do not enable
# this option. Read the HTML docs for more information on
# implementing distributed monitoring.
# Values: 1 = obsess over services, 0 = do not obsess (default)
obsess_over_services=0
# OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND
# This is the command that is run for every service check that is
# processed by Nagios. This command is executed only if the
# obsess_over_services option (above) is set to 1. The command
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on implementing distributed monitoring.
#ocsp_command=somecommand
# OBSESS OVER HOST CHECKS OPTION
# This determines whether or not Nagios will obsess over host
# checks and run the ochp_command defined below. Unless you're
# planning on implementing distributed monitoring, do not enable
# this option. Read the HTML docs for more information on
# implementing distributed monitoring.
# Values: 1 = obsess over hosts, 0 = do not obsess (default)
obsess_over_hosts=0
# OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND
# This is the command that is run for every host check that is
# processed by Nagios. This command is executed only if the
# obsess_over_hosts option (above) is set to 1. The command
# argument is the short name of a command definition that you
# define in your host configuration file. Read the HTML docs for
# more information on implementing distributed monitoring.
#ochp_command=somecommand
# SERVICE FRESHNESS CHECK OPTION
# This option determines whether or not Nagios will periodically
# check the "freshness" of service results. Enabling this option
# is useful for ensuring passive checks are received in a timely
# manner.
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
check_service_freshness=1
# SERVICE FRESHNESS CHECK INTERVAL
# This setting determines how often (in seconds) Nagios will
# check the "freshness" of service check results. If you have
# disabled service freshness checking, this option has no effect.
#service_freshness_check_interval=60
service_freshness_check_interval=420
# HOST FRESHNESS CHECK OPTION
# This option determines whether or not Nagios will periodically
# check the "freshness" of host results. Enabling this option
# is useful for ensuring passive checks are received in a timely
# manner.
# Values: 1 = enabled freshness checking, 0 = disable freshness checking
check_host_freshness=0
#check_host_freshness=1
# HOST FRESHNESS CHECK INTERVAL
# This setting determines how often (in seconds) Nagios will
# check the "freshness" of host check results. If you have
# disabled host freshness checking, this option has no effect.
#host_freshness_check_interval=60
host_freshness_check_interval=420
# ADDITIONAL FRESHNESS THRESHOLD LATENCY
# This setting determines the number of seconds that Nagios
# will add to any host and service freshness thresholds that
# it calculates (those not explicitly specified by the user).
#additional_freshness_latency=15
additional_freshness_latency=180
# LARGE INSTALLATION TWEAKS OPTION
# This option determines whether or not Nagios will take some shortcuts
# which can save on memory and CPU usage in large Nagios installations.
# Read the documentation for more information on the benefits/tradeoffs
# of enabling this option.
# Values: 1 - Enabled tweaks
# 0 - Disable tweaks (default)
use_large_installation_tweaks=1
# CHILD PROCESS MEMORY OPTION
# This option determines whether or not Nagios will free memory in
# child processes (processed used to execute system commands and host/
# service checks). If you specify a value here, it will override
# program defaults.
# Value: 1 - Free memory in child processes
# 0 - Do not free memory in child processes
#free_child_process_memory=1
# CHILD PROCESS FORKING BEHAVIOR
# This option determines how Nagios will fork child processes
# (used to execute system commands and host/service checks). Normally
# child processes are fork()ed twice, which provides a very high level
# of isolation from problems. Fork()ing once is probably enough and will
# save a great deal on CPU usage (in large installs), so you might
# want to consider using this. If you specify a value here, it will
# program defaults.
# Value: 1 - Child processes fork() twice
# 0 - Child processes fork() just once
#child_processes_fork_twice=1
child_processes_fork_twice=0
# DEBUG LEVEL
# This option determines how much (if any) debugging information will
# be written to the debug file. OR values together to log multiple
# types of information.
# Values:
# -1 = Everything
# 0 = Nothing
# 1 = Functions
# 2 = Configuration
# 4 = Process information
# 8 = Scheduled events
# 16 = Host/service checks
# 32 = Notifications
# 64 = Event broker
# 128 = External commands
# 256 = Commands
# 512 = Scheduled downtime
# 1024 = Comments
# 2048 = Macros
debug_level=16
# DEBUG VERBOSITY
# This option determines how verbose the debug log out will be.
# Values: 0 = Brief output
# 1 = More detailed
# 2 = Very detailed
debug_verbosity=1
Thanks in advance for your help.
Trisha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100504/53dcef43/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list