2.0b5 initial host/service checks delayed after start (not present in 2.0b3)
Eli Stair
estair at ilm.com
Thu Dec 8 22:27:06 CET 2005
FYI, in the time it's taking to wait for nagios to start polling
anything after starting it up I decided to look at what it's doing...
This would explain why it starts up and sits around not consuming any
cycles but not polling. Sleep left in the code? These entries in the
log each come afer a few minutes (119 and 175 seconds apart) each..
This is running on 2.0b6, x86_64 arch, compiled from source with perlcache.
/eli
###FILE: nagios.log:
[1134076786] Finished daemonizing... (New PID=11914)
[1134076905] service_result_worker_thread(): poll(): EINTR (impossible)
[1134077080] service_result_worker_thread(): poll(): EINTR (impossible)
### GDB info:
Attaching to program: /usr/local/nagios/bin/nagios, process 11914
Reading symbols from
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/libperl.so...(no
debugging symbols found)...done.
Loaded symbols for
/usr/lib64/perl5/5.8.5/x86_64-linux-thread-multi/CORE/libperl.so
Reading symbols from /lib64/libnsl.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libnsl.so.1
Reading symbols from /lib64/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/tls/libm.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/tls/libm.so.6
Reading symbols from /lib64/libcrypt.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /lib64/libutil.so.1...(no debugging symbols
found)...done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/tls/libpthread.so.0...
(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 182894164416 (LWP 11914)]
[New Thread 1094719840 (LWP 11917)]
[New Thread 1084229984 (LWP 11915)]
Loaded symbols for /lib64/tls/libpthread.so.0
Reading symbols from /lib64/tls/libc.so.6...(no debugging symbols
found)...done.
Loaded symbols for /lib64/tls/libc.so.6
Reading symbols from /usr/lib64/libltdl.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/libltdl.so.3
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000364700b9c5 in __nanosleep_nocancel ()
from /lib64/tls/libpthread.so.0
(gdb) where
#0 0x000000364700b9c5 in __nanosleep_nocancel () from
/lib64/tls/libpthread.so.0
#1 0x00000000004209aa in event_execution_loop ()
#2 0x000000000040efa0 in main ()
(gdb) info registers
rax 0xfffffffffffffdfc -516
rbx 0x861bb0 8788912
rcx 0xffffffffffffffff -1
rdx 0x2 2
rsi 0x0 0
rdi 0x7fbffff450 548682069072
rbp 0x0 0x0
rsp 0x7fbffff410 0x7fbffff410
r8 0x0 0
r9 0x2e8a 11914
r10 0x7fbffff301 548682068737
r11 0x202 514
r12 0x7fbffff450 548682069072
r13 0xffffffff 4294967295
r14 0xffffffff 4294967295
r15 0x7fbffffa08 548682070536
rip 0x364700b9c5 0x364700b9c5 <__nanosleep_nocancel+60>
eflags 0x202 514
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Fred wrote:
> I do the same thing with check_icmp except that I use sudo and create
> a simple sudo entry like (see the CHECK_ICMP):
>
> Cmnd_Alias CHECKALLSSHKEYS = /opt/hptc/nagios/libexec/check_keys #
> HP-HPTC-KeySync
> Cmnd_Alias CHECKSYSLOGALERTS =
> /opt/hptc/nagios/libexec/check_syslogalerts # HP-HPTC-SysLog
> Cmnd_Alias CHECKSFS = /opt/hptc/nagios/libexec/check_sfs # HP-HPTC-SysLog
> Cmnd_Alias CHECKLSF = /opt/hptc/nagios/libexec/check_lsf # HP-HPTC-CheckLSF
> Cmnd_Alias CHECKICMP = /opt/hptc/nagios/libexec/check_icmp #
> HP-HPTC-CheckICMP
> nagios ALL = NOPASSWD:
> CHECKALLSSHKEYS,CHECKSYSLOGALERTS,CHECKSFS,CHECKLSF,CHECKICMP #
> HP-HPTC-Nagios
>
> I just built the 2.0b5 and hope to give it a try in the next few days on a
> 700+ node system ... I am hoping that this *solves* the delay problem
> that existed in the previous releases.
>
> -FredC
>
>
> */Eli Stair <estair at ilm.com>/* wrote:
>
>
> I'm running a fresh build of 2.0b5 on x86_64. After an initial start of
> nagios, it can take up to 10 minutes for the first host or service
> checks to begin. There is no CPU load by the nagios process during this
> time. I have over 1000 hosts to check, and have reduced the max
> host/service check spread in order to ensure that it is not "evening"
> out the time.
>
> This problem is NOT occuring on a 2.0b3 build, with the same exact
> configuration.
>
> After the checks DO start, it can take hours to finish. I've changed
> the user to root so that I can have the host check be check_icmp -t
> 1 -p
> 1.
>
> Unfortunately, even with this situation, having anywhere between 4 and
> 64 hosts go down can make the "monitoring" aspect effectively useless.
>
> Any suggestions on the problem of startup lag?
> Any ways to further speed up the host check runs, aside from using
> check_icmp?
>
> Thanks,
>
> /eli
>
> ### inline nagios.cfg:
>
>
> [root at monitor02 etc]# cat nagios.cfg | egrep -v "^#|^$"
> log_file=/var/log/nagios/nagios.log
> cfg_file=/usr/local/nagios/etc/checkcommands.cfg
> cfg_file=/usr/local/nagios/etc/misccommands.cfg
> cfg_dir=/usr/local/nagios/etc/config
> cfg_file=/usr/local/nagios/etc/timeperiods.cfg
> cfg_file=/usr/local/nagios/etc/contacts.cfg
> cfg_file=/usr/local/nagios/etc/contactgroups.cfg
> cfg_file=/usr/local/nagios/etc/hosts.cfg
> cfg_file=/usr/local/nagios/etc/hostgroups.cfg
> cfg_file=/usr/local/nagios/etc/customcommands.cfg
> cfg_file=/usr/local/nagios/etc/services.cfg
> object_cache_file=/usr/local/nagios/var/objects.cache
> resource_file=/usr/local/nagios/etc/resource.cfg
> status_file=/usr/local/nagios/var/status.dat
> nagios_user=root
> nagios_group=root
> check_external_commands=1
> command_check_interval=-1
> command_file=/usr/local/nagios/var/rw/nagios.cmd
> comment_file=/usr/local/nagios/var/comments.dat
> downtime_file=/usr/local/nagios/var/downtime.dat
> lock_file=/usr/local/nagios/var/nagios.lock
> temp _file=/usr/local/nagios/var/nagios.tmp
> event_broker_options=-1
> log_rotation_method=d
> log_archive_path=/var/log/nagios/archives
> use_syslog=1
> log_notifications=1
> log_service_retries=1
> log_host_retries=1
> log_event_handlers=1
> log_initial_states=0
> log_external_commands=1
> log_passive_checks=1
> service_inter_check_delay_method=s
> max_service_check_spread=15
> service_interleave_factor=s
> host_inter_check_delay_method=s
> max_host_check_spread=10
> max_concurrent_checks=0
> service_reaper_frequency=15
> auto_reschedule_checks=0
> auto_rescheduling_interval=30
> auto_rescheduling_window=180
> sleep_time=0.25
> service_check_timeout=60
> host_check_timeout=30
> event_handler_timeout=30
> notification_timeout=30
> ocsp_timeout=5
> perfdata_timeout=5
> retain_state_information=1
> state_retention_file=/usr/local/nagios/var/retention.dat
> retention_update_interval=0
> use_retained_program_state=1
> use_retained_scheduling_info=0
> interv al_length=60
> use_aggressive_host_checking=0
> execute_service_checks=1
> accept_passive_service_checks=0
> execute_host_checks=1
> accept_passive_host_checks=1
> enable_notifications=1
> enable_event_handlers=1
> process_performance_data=0
> obsess_over_services=0
> check_for_orphaned_services=0
> check_service_freshness=1
> service_freshness_check_interval=60
> check_host_freshness=1
> host_freshness_check_interval=60
> aggregate_status_updates=1
> status_update_interval=15
> enable_flap_detection=0
> low_service_flap_threshold=5.0
> high_service_flap_threshold=20.0
> low_host_flap_threshold=5.0
> high_host_flap_threshold=20.0
> date_format=iso8601
> illegal_object_name_chars=`~!$%^&*|'"<>?,()=
> illegal_macro_output_chars=`~$&|'"<>
> use_regexp_matching=0
> use_true_regexp_matching=0
> admin_email=nagios
> admin_pager=pagenagios
> daemon_dumps_core=0
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through
> log files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
>
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list