nagios 3 host checks logic problem on some kernels/distros
Thomas Stolle
it0a60 at retail-sc.com
Tue Sep 18 10:11:00 CEST 2007
From: SCHAER Frederic <frederic.schaer <at> cea.fr>
Subject: nagios 3 host checks logic problem on some kernels/distros
Newsgroups: gmane.network.nagios.devel
Date: 2007-09-10 16:17:30 GMT (1 week, 15 hours and 23 minutes ago)
Hi,
I think I identified a problem (but not and the solution) on the nagios 3
source tree?
I tried with both the 3.0b3 and cvs HEAD source files and could not get
rid of the problem.
I?m running a 2.4.21 kernel on a RHEL3 box.
What happens is that as soon as I start nagios 3, it starts eating all of
the CPU.
Stracing the nagios process shows this (and almost only this):
gettimeofday({1189419621, 161574}, NULL) = 0
time([1189419621]) = 1189419621
time([1189419621]) = 1189419621
gettimeofday({1189419621, 183742}, NULL) = 0
gettimeofday({1189419621, 183780}, NULL) = 0
gettimeofday({1189419621, 183814}, NULL) = 0
time([1189419621]) = 1189419621
gettimeofday({1189419621, 184172}, NULL) = 0
gettimeofday({1189419621, 184326}, NULL) = 0
time([1189419621]) = 1189419621
time([1189419621]) = 1189419621
gettimeofday({1189419621, 184734}, NULL) = 0
gettimeofday({1189419621, 184861}, NULL) = 0
I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is
much different : there are nanosleep calls?
I tried activating and deactivating nanosleeps at nagios compile time, but
this did not solve my problem.
Having full debug, I have this kind of output at the nagios start :
[1189438977.881574] [016.0] [pid=18234] Attempting to run scheduled check
of host 'wn010': check options=0, latency=0.874000
[1189438977.881651] [001.0] [pid=18234] run_async_host_check_3x()
[1189438977.881665] [016.0] [pid=18234] ** Running async check of host
'wn010'...
[1189438977.881678] [001.0] [pid=18234] check_host_check_viability_3x()
[1189438977.881691] [001.0] [pid=18234] check_time_against_period()
[1189438977.881712] [001.0] [pid=18234] check_host_dependencies()
[1189438977.881726] [016.1] [pid=18234] A check of this host is already
being executed, so we'll pass for the moment...
[1189438977.881739] [016.1] [pid=18234] Unable to run scheduled host check
at this time
If I run nagios just for 2 seconds and then hit CTRL+C, I still see this
:
>grep "A check of this host is already being executed"
/var/log/nagios/nagios.debug | wc -l
971
>grep "Attempting to run scheduled check of host 'wn010'"
/var/log/nagios/nagios.debug | wc -l
971
>grep "Attempting to run scheduled check of host"
/var/log/nagios/nagios.debug | wc -l
971
I have 53 hosts defined, I don?t understand why nagios is checking ever
and ever the same host? and why this is not happening on all systems.
De-activating host checks magically ?solves? the problem.
I just found out that commenting hosts ?check_command? caused this
behaviour (with host_checks_enabled=true), and that defining a correct
check_command prevented nagios from being so CPU hungry?
Hope I helped?
Cheers
Dear List,
I can confirm the problem Frederic reported.
I am using Nagios 3.0b3 on CentOS 4.4
After starting nagios, the process catches nearly 100 % CPU (See
top-output below)
Disableing hostchecks let the process return to normal values.
As far as I can remember, the problem did not occour with nagios3.0a (but
I can not verify at the moment)
Tasks: 89 total, 3 running, 86 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.0% us, 1.3% sy, 0.0% ni, 72.6% id, 0.0% wa, 0.1% hi, 0.0%
si
Mem: 4041580k total, 1373844k used, 2667736k free, 60200k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1137348k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28617 nagios 25 0 29756 10m 1056 R 96 0.3 17:12.48 nagios
1 root 16 0 4752 552 460 S 0 0.0 0:02.75 init
2 root RT 0 0 0 0 S 0 0.0 0:00.04 migration/0
Thomas
P Please consider the environmental impact of needlessly printing this
e-mail.
--
RSC Commercial Services OHG
Wanheimer Strasse 70, D-40468 Duesseldorf
Registergericht: Duesseldorf, HRA 12655
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20070918/f6de49e9/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel
More information about the Developers
mailing list