Tweaking Nagios Performance (Checks/Notifications)
Mirza Dedic
mirde at oppy.com
Tue Oct 6 23:57:09 CEST 2009
I recently finished moving Nagios from a Virtual machine to bare-bone hardware, on a PowerEdge retired machine (dual-core, 4GB ram, raid-5 10k RPM HDs). My goal is to have a 1 minute window between when a host/service goes down and the time that I receive a message that it is down.
We are monitoring a total of 347 Services and 82 Hosts, mainly using the plug-ins below:
- Check_by_ssh
- Check_nt (NSClient++ for Win32)
- Check_http
- Check_ping
- Check_esx3
- Check_mysql
Below are my "performance info" for the current setup:
Time Frame
Services Checked
<= 1 minute
65 (18.7%)
<= 5 minutes
300 (86.5%)
<= 15 minutes
347 (100.0%)
<= 1 hour
347 (100.0%)
Since program start
347 (100.0%)
Metric
Min.
Max.
Average
Check Execution Time
0.01 sec
21.91 sec
1.603 sec
Check Latency
0.00 sec
0.00 sec
0.164 sec
Percent State Change
0.00%
0.00%
0.00%
Services Passively Checked
Time Frame
Services Checked
<= 1 minute
0 (0.0%)
<= 5 minutes
0 (0.0%)
<= 15 minutes
0 (0.0%)
<= 1 hour
0 (0.0%)
Since program start
0 (0.0%)
Metric
Min.
Max.
Average
Percent State Change
0.00%
0.00%
0.00%
Hosts Actively Checked
Time Frame
Hosts Checked
<= 1 minute
0 (0.0%)
<= 5 minutes
78 (95.1%)
<= 15 minutes
82 (100.0%)
<= 1 hour
82 (100.0%)
Since program start
82 (100.0%)
Metric
Min.
Max.
Average
Check Execution Time
0.29 sec
4.03 sec
2.483 sec
Check Latency
0.15 sec
0.78 sec
0.565 sec
Percent State Change
0.00%
0.00%
0.00%
Hosts Passively Checked
Time Frame
Hosts Checked
<= 1 minute
0 (0.0%)
<= 5 minutes
0 (0.0%)
<= 15 minutes
0 (0.0%)
<= 1 hour
0 (0.0%)
Since program start
0 (0.0%)
Metric
Min.
Max.
Average
Percent State Change
0.00%
0.00%
0.00%
When I restart Nagios and monitoring the box, the total CPU consumption does not spike past 10%, so I would like to squeeze the checks tighter to use the additional resource available.
Below is my nagios.cfg for current setup: Can anyone suggest some changes that I could do to achieve the results wanted?
# MERLIN BROKER MODULE
broker_module=/usr/local/nagios/addons/merlin/merlin.so /usr/local/nagios/addons/merlin/merlin.conf
log_file=/usr/local/nagios/var/nagios.log
# localhost
cfg_file=/usr/local/nagios/etc/localhost.cfg
# Locations
cfg_dir=/usr/local/nagios/etc/locations
# Devices
cfg_dir=/usr/local/nagios/etc/devices
# Objects
cfg_dir=/usr/local/nagios/etc/objects
# OBJECT CACHE FILE
object_cache_file=/usr/local/nagios/var/objects.cache
# PRE-CACHED OBJECT FILE
precached_object_file=/usr/local/nagios/var/objects.precache
# RESOURCE FILE
resource_file=/usr/local/nagios/etc/resource.cfg
# STATUS FILE
status_file=/usr/local/nagios/var/status.dat
# STATUS FILE UPDATE INTERVAL
status_update_interval=5
# NAGIOS USER
nagios_user=nagios
# NAGIOS GROUP
nagios_group=nagios
# EXTERNAL COMMAND OPTION
check_external_commands=1
# EXTERNAL COMMAND CHECK INTERVAL
command_check_interval=-1
# EXTERNAL COMMAND FILE
command_file=/usr/local/nagios/var/rw/nagios.cmd
# EXTERNAL COMMAND BUFFER SLOTS
external_command_buffer_slots=4096
# LOCK FILE
lock_file=/usr/local/nagios/var/nagios.lock
# TEMP FILE
temp_file=/usr/local/nagios/var/nagios.tmp
# TEMP PATH
temp_path=/tmp
# EVENT BROKER OPTIONS
event_broker_options=-1
# LOG ROTATION METHOD
log_rotation_method=d
# LOG ARCHIVE PATH
log_archive_path=/usr/local/nagios/var/archives
# LOGGING OPTIONS
use_syslog=1
# NOTIFICATION LOGGING OPTION
log_notifications=1
# SERVICE RETRY LOGGING OPTION
log_service_retries=1
# HOST RETRY LOGGING OPTION
log_host_retries=1
# EVENT HANDLER LOGGING OPTION
log_event_handlers=1
# INITIAL STATES LOGGING OPTION
log_initial_states=1
# EXTERNAL COMMANDS LOGGING OPTION
log_external_commands=1
# PASSIVE CHECKS LOGGING OPTION
log_passive_checks=1
# SERVICE INTER-CHECK DELAY METHOD
service_inter_check_delay_method=s
# MAXIMUM SERVICE CHECK SPREAD
max_service_check_spread=5
# SERVICE CHECK INTERLEAVE FACTOR
service_interleave_factor=s
# HOST INTER-CHECK DELAY METHOD
host_inter_check_delay_method=s
# MAXIMUM HOST CHECK SPREAD
max_host_check_spread=3
# MAXIMUM CONCURRENT SERVICE CHECKS
max_concurrent_checks=0
# HOST AND SERVICE CHECK REAPER FREQUENCY
check_result_reaper_frequency=10
# MAX CHECK RESULT REAPER TIME
max_check_result_reaper_time=30
# CHECK RESULT PATH
check_result_path=/usr/local/nagios/var/spool/checkresults
# MAX CHECK RESULT FILE AGE
max_check_result_file_age=3600
# CACHED HOST CHECK HORIZON
cached_host_check_horizon=10
# CACHED SERVICE CHECK HORIZON
cached_service_check_horizon=10
# ENABLE PREDICTIVE HOST DEPENDENCY CHECKS
enable_predictive_host_dependency_checks=1
# ENABLE PREDICTIVE SERVICE DEPENDENCY CHECKS
enable_predictive_service_dependency_checks=1
# SOFT STATE DEPENDENCIES
soft_state_dependencies=0
#time_change_threshold=900
# AUTO-RESCHEDULING OPTION
auto_reschedule_checks=0
# AUTO-RESCHEDULING INTERVAL
auto_rescheduling_interval=30
# AUTO-RESCHEDULING WINDOW
auto_rescheduling_window=180
# SLEEP TIME
sleep_time=0.25
# TIMEOUT VALUES
service_check_timeout=30
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
# RETAIN STATE INFORMATION
retain_state_information=0
# STATE RETENTION FILE
state_retention_file=/usr/local/nagios/var/retention.dat
# RETENTION DATA UPDATE INTERVAL
retention_update_interval=5
# USE RETAINED PROGRAM STATE
use_retained_program_state=0
# USE RETAINED SCHEDULING INFO
use_retained_scheduling_info=0
# This mask determines what host attributes are not retained
retained_host_attribute_mask=0
# This mask determines what service attributes are not retained
retained_service_attribute_mask=0
# These two masks determine what process attributes are not retained.
# There are two masks, because some process attributes have host and service
# options. For example, you can disable active host checks, but leave active
# service checks enabled.
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
# These two masks determine what contact attributes are not retained.
# There are two masks, because some contact attributes have host and
# service options. For example, you can disable host notifications for
# a contact, but leave service notifications enabled for them.
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
# INTERVAL LENGTH
interval_length=60
# CHECK FOR UPDATES
check_for_updates=0
# BARE UPDATE CHECK
bare_update_check=0
# AGGRESSIVE HOST CHECKING OPTION
use_aggressive_host_checking=0
# SERVICE CHECK EXECUTION OPTION
execute_service_checks=1
# PASSIVE SERVICE CHECK ACCEPTANCE OPTION
accept_passive_service_checks=1
# HOST CHECK EXECUTION OPTION
execute_host_checks=1
# PASSIVE HOST CHECK ACCEPTANCE OPTION
accept_passive_host_checks=1
# NOTIFICATIONS OPTION
enable_notifications=1
# EVENT HANDLER USE OPTION
enable_event_handlers=1
# PROCESS PERFORMANCE DATA OPTION
process_performance_data=1
# HOST AND SERVICE PERFORMANCE DATA PROCESSING COMMANDS
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
# HOST AND SERVICE PERFORMANCE DATA FILES
host_perfdata_file=/tmp/host-perfdata
service_perfdata_file=/tmp/service-perfdata
# HOST AND SERVICE PERFORMANCE DATA FILE TEMPLATES
host_perfdata_file_template=[HOSTPERFDATA]\t$TIMET$\t$HOSTNAME$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$
service_perfdata_file_template=[SERVICEPERFDATA]\t$TIMET$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$
# HOST AND SERVICE PERFORMANCE DATA FILE MODES
host_perfdata_file_mode=a
service_perfdata_file_mode=a
# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING INTERVAL
host_perfdata_file_processing_interval=0
service_perfdata_file_processing_interval=0
# HOST AND SERVICE PERFORMANCE DATA FILE PROCESSING COMMANDS
host_perfdata_file_processing_command=process-host-perfdata-file
service_perfdata_file_processing_command=process-service-perfdata-file
# OBSESS OVER SERVICE CHECKS OPTION
obsess_over_services=0
# OBSESSIVE COMPULSIVE SERVICE PROCESSOR COMMAND
#ocsp_command=somecommand
# OBSESS OVER HOST CHECKS OPTION
obsess_over_hosts=0
# OBSESSIVE COMPULSIVE HOST PROCESSOR COMMAND
#ochp_command=somecommand
# TRANSLATE PASSIVE HOST CHECKS OPTION
translate_passive_host_checks=0
# PASSIVE HOST CHECKS ARE SOFT OPTION
passive_host_checks_are_soft=0
# ORPHANED HOST/SERVICE CHECK OPTIONS
check_for_orphaned_services=1
check_for_orphaned_hosts=1
# SERVICE FRESHNESS CHECK OPTION
check_service_freshness=1
# SERVICE FRESHNESS CHECK INTERVAL
service_freshness_check_interval=60
# HOST FRESHNESS CHECK OPTION
check_host_freshness=1
# HOST FRESHNESS CHECK INTERVAL
host_freshness_check_interval=60
# ADDITIONAL FRESHNESS THRESHOLD LATENCY
additional_freshness_latency=15
# FLAP DETECTION OPTION
enable_flap_detection=1
# FLAP DETECTION THRESHOLDS FOR HOSTS AND SERVICES
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
# DATE FORMAT OPTION
date_format=us
# TIMEZONE OFFSET
#use_timezone=US/Mountain
#use_timezone=Australia/Brisbane
# P1.PL FILE LOCATION
p1_file=/usr/local/nagios/bin/p1.pl
# EMBEDDED PERL INTERPRETER OPTION
enable_embedded_perl=0
# EMBEDDED PERL USAGE OPTION
use_embedded_perl_implicitly=0
# ILLEGAL OBJECT NAME CHARACTERS
illegal_object_name_chars='
# ILLEGAL MACRO OUTPUT CHARACTERS
illegal_macro_output_chars='
# REGULAR EXPRESSION MATCHING
use_regexp_matching=0
# "TRUE" REGULAR EXPRESSION MATCHING
use_true_regexp_matching=0
# ADMINISTRATOR EMAIL/PAGER ADDRESSES
admin_email=mirde at oppy.com
admin_pager=mirde at oppy.com
# DAEMON CORE DUMP OPTION
daemon_dumps_core=1
# LARGE INSTALLATION TWEAKS OPTION
use_large_installation_tweaks=0
# ENABLE ENVIRONMENT MACROS
enable_environment_macros=1
# CHILD PROCESS MEMORY OPTION
#free_child_process_memory=1
# CHILD PROCESS FORKING BEHAVIOR
#child_processes_fork_twice=1
# DEBUG LEVEL
debug_level=-1
# DEBUG VERBOSITY
debug_verbosity=1
# DEBUG FILE
debug_file=/usr/local/nagios/var/nagios.debug
# MAX DEBUG FILE SIZE
max_debug_file_size=1000000
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20091006/35d78895/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list