nagios freeze while a long time
Detrak
detrak at caere.fr
Wed Oct 24 10:34:01 CEST 2007
Hi,
We use Nagios on several servers, in version 2.9 with NDO 1.4b5 and perf2rdd
(nagios write performance data in a pipe file and perf2rrd perform it in rrd
file). Running on RHEL4 with package from dag.wieers.com
we have 80 hosts and 420 services on this server.
We can see some huge gaps in our graphs, perf2rrd works fine, my first
investigation shows this message in nagios.log file :
[1193178252] ndomod: Error writing to data sink! Some output may get
lost...
[1193178268] ndomod: Successfully reconnected to data sink! 0 items lost,
240 queued items to flush.
[1193178269] ndomod: Successfully flushed 240 queued items to data sink.
[1193187298] Warning: A system time change of 8729 seconds (forwards in
time) has been detected. Compensating...
[1193190553] Warning: A system time change of 3255 seconds (forwards in
time) has been detected. Compensating...
we have recompiled nagios with debug mode :
--enable-DEBUG2 shows warning messages
--enable-DEBUG3 shows scheduled events
we don't use le DEBUG0 because it generates too much informations and the
log file increases too fast.
so, I found this message in debug information, with the last gap :
- Masquer le texte des messages précédents -
*** Event Check Loop ***
Current time: Wed Oct 24 00:29:29 2007
Next High Priority Event Time: Wed Oct 24 00:29:30 2007
Next Low Priority Event Time: Wed Oct 24 00:29:29 2007
Current/Max Outstanding Service Checks: 19/65
*** Event Details ***
Event time: Wed Oct 24 00:29:29 2007
Event type: 0 (service check)
Service Description: LOAD_AVERAGE at LOADAVERAGE
Associated Host: SGBD1
Checking service 'LOAD_AVERAGE at LOADAVERAGE' on host 'SGBD1'...
- Masquer le texte des messages précédents -
*** Event Check Loop ***
Current time: Wed Oct 24 00:29:29 2007
Next High Priority Event Time: Wed Oct 24 00:29:30 2007
Next Low Priority Event Time: Wed Oct 24 00:29:29 2007
Current/Max Outstanding Service Checks: 20/65
*** Event Details ***
Event time: Wed Oct 24 00:29:29 2007
Event type: 0 (service check)
Service Description: LOAD_AVERAGE at LOADAVERAGE
Associated Host: INTEG
Checking service 'LOAD_AVERAGE at LOADAVERAGE' on host 'INTEG'...
Warning: A system time change of 8729 seconds (forwards in time) has been
detected. Compensating...
*** Event Check Loop ***
Current time: Wed Oct 24 02:54:58 2007
Next High Priority Event Time: Wed Oct 24 02:54:59 2007
Next Low Priority Event Time: Wed Oct 24 02:54:58 2007
Current/Max Outstanding Service Checks: 21/65
*** Event Details ***
Event time: Wed Oct 24 02:54:58 2007
Event type: 0 (service check)
Service Description: MONITOR_TELNET_SUIVI_PS
Associated Host: PREPROD1
Checking service 'MONITOR_TELNET_SUIVI_PS' on host 'PREPROD1'...
Warning: A system time change of 3255 seconds (forwards in time) has been
detected. Compensating...
*** Event Check Loop ***
Current time: Wed Oct 24 03:49:13 2007
Next High Priority Event Time: Wed Oct 24 03:49:14 2007
Next Low Priority Event Time: Wed Oct 24 03:49:13 2007
Current/Max Outstanding Service Checks: 22/65
*** Event Details ***
Event time: Wed Oct 24 03:49:13 2007
Event type: 0 (service check)
Service Description: MONITOR_TELNET_SUIVI_PS
Associated Host: BIDS15
Checking service 'MONITOR_TELNET_SUIVI_PS' on host 'BIDS15'...
we can see the jump
00:29:29 to 02:54:58
and 02:54:58 to 03:49:13
without activity in nagios! I dont understand this!
if you can give me some help to have a nagios server with more stability. I
dont know how to reproduce this bug. At the time a gap was accuring, the
server time was up to date.
We have on this server more than a gap by day!
best regards,
Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20071024/89ae73bd/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list