Service Check Timed Out in nagios 2.0b1
Robert Drake
rdrake at stayonline.net
Fri Jan 21 22:51:48 CET 2005
I'm seeing alot of these for some reason. Recently we got rid of all
our slow service checks so we changed the interval from 20 minutes to 5.
We've got 4600 services (4300 hosts).
Currently we've got 42 hosts down, 4 unreachable. We've got 64 service
check failures (90% of those being Service Check Timed Out)
The problem is that alot of these timeouts are false positives. Even
though it takes 2 seconds to check a host (even when down) nagios times
them out.
The box isn't overloaded, usually the load average is around 1.0.
With that many service checks spread over 5 minutes you should see about
15-16 processes/sec running. Instead it fluctuates up and down,
sometimes you'll only see 2 sometimes you'll see 30. It's running
behind schedule too, which is weird. It's always run behind schedule,
but I don't understand why it can't make a schedule it can kind of
match.
Here's my nagios.cfg:
log_file=/usr/local/nagios/var/nagios.log
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/et_list.cfg
cfg_file=/usr/local/nagios/etc/dev_list.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/dyn_hostgroups.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/misccommands.cfg
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/var/cache/status.log
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=1
command_file=/usr/local/nagios/var/nagios.cmd
comment_file=/usr/local/nagios/var/comment.log
downtime_file=/usr/local/nagios/var/downtime.log
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/var/nagios.tmp
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
host_inter_check_delay_method=s
service_inter_check_delay_method=s
service_interleave_factor=s
max_concurrent_checks=60
service_reaper_frequency=1
sleep_time=0.25
service_check_timeout=60
host_check_timeout=60
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5
retain_state_information=1
state_retention_file=/usr/local/nagios/var/status.sav
retention_update_interval=300
use_retained_program_state=1
use_retained_scheduling_info=1
interval_length=60
use_agressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=0
obsess_over_services=0
check_for_orphaned_services=0
check_service_freshness=1
freshness_check_interval=60
aggregate_status_updates=1
status_update_interval=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=iso8601
admin_email=sos at stayonline.net
admin_pager=sos at stayonline.net
illegal_macro_output_chars=`~$^&"|'<>;
broker_module=/usr/local/nagios/bin/inserter.o
event_broker_options=-1
It seems like this question has been asked a few times before, but
nobody seems to have a good answer for it. Is there a magic command I'm
missing that makes things better?
Thanks,
Robert
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050121/07b4bfba/attachment.sig>
More information about the Users
mailing list