Hi,<br><br><br>We use Nagios on several servers, in version 2.9 with NDO 1.4b5 and perf2rdd (nagios write performance data in a pipe file and perf2rrd perform it in rrd file). Running on RHEL4 with package from <a href="http://dag.wieers.com">
dag.wieers.com</a><br><br>we have 80 hosts and 420 services on this server.<br><br><br><br>We can see some huge gaps in our graphs, perf2rrd works fine, my first investigation shows this message in nagios.log file :<br>[1193178252] ndomod: Error writing to data sink! Some output may get lost...
<br>[1193178268] ndomod: Successfully reconnected to data sink! 0 items lost, 240 queued items to flush.<br>[1193178269] ndomod: Successfully flushed 240 queued items to data sink.<br>[1193187298] Warning: A system time change of 8729 seconds (forwards in time) has been detected. Compensating...
<br>[1193190553] Warning: A system time change of 3255 seconds (forwards in time) has been detected. Compensating...<br><br><br><br>we have recompiled nagios with debug mode :<br>--enable-DEBUG2 shows warning messages<br>
--enable-DEBUG3 shows scheduled events<br><br>we don't use le DEBUG0 because it generates too much informations and the log file increases too fast.<br><br><br>so, I found this message in debug information, with the last gap :
<br>- Masquer le texte des messages précédents -<br><br><br>*** Event Check Loop ***<br> Current time: Wed Oct 24 00:29:29 2007<br> Next High Priority Event Time: Wed Oct 24 00:29:30 2007<br> Next Low Priority Event Time: Wed Oct 24 00:29:29 2007
<br>Current/Max Outstanding Service Checks: 19/65<br>*** Event Details ***<br> Event time: Wed Oct 24 00:29:29 2007<br> Event type: 0 (service check)<br> Service Description: LOAD_AVERAGE@LOADAVERAGE
<br> Associated Host: SGBD1<br> Checking service 'LOAD_AVERAGE@LOADAVERAGE' on host 'SGBD1'...<br><br>- Masquer le texte des messages précédents -<br>*** Event Check Loop ***<br> Current time: Wed Oct 24 00:29:29 2007
<br> Next High Priority Event Time: Wed Oct 24 00:29:30 2007<br> Next Low Priority Event Time: Wed Oct 24 00:29:29 2007<br>Current/Max Outstanding Service Checks: 20/65<br>*** Event Details ***<br> Event time: Wed Oct 24 00:29:29 2007
<br> Event type: 0 (service check)<br> Service Description: LOAD_AVERAGE@LOADAVERAGE<br> Associated Host: INTEG<br> Checking service 'LOAD_AVERAGE@LOADAVERAGE' on host 'INTEG'...
<br>Warning: A system time change of 8729 seconds (forwards in time) has been detected. Compensating...<br><br>*** Event Check Loop ***<br> Current time: Wed Oct 24 02:54:58 2007<br> Next High Priority Event Time: Wed Oct 24 02:54:59 2007
<br> Next Low Priority Event Time: Wed Oct 24 02:54:58 2007<br>Current/Max Outstanding Service Checks: 21/65<br>*** Event Details ***<br> Event time: Wed Oct 24 02:54:58 2007<br> Event type: 0 (service check)
<br> Service Description: MONITOR_TELNET_SUIVI_PS<br> Associated Host: PREPROD1<br> Checking service 'MONITOR_TELNET_SUIVI_PS' on host 'PREPROD1'...<br>Warning: A system time change of 3255 seconds (forwards in time) has been detected. Compensating...
<br><br>*** Event Check Loop ***<br> Current time: Wed Oct 24 03:49:13 2007<br> Next High Priority Event Time: Wed Oct 24 03:49:14 2007<br> Next Low Priority Event Time: Wed Oct 24 03:49:13 2007<br>Current/Max Outstanding Service Checks: 22/65
<br>*** Event Details ***<br> Event time: Wed Oct 24 03:49:13 2007<br> Event type: 0 (service check)<br> Service Description: MONITOR_TELNET_SUIVI_PS<br> Associated Host: BIDS15
<br> Checking service 'MONITOR_TELNET_SUIVI_PS' on host 'BIDS15'...<br><br><br>we can see the jump<br> 00:29:29 to 02:54:58<br>and 02:54:58 to 03:49:13<br><br>without activity in nagios! I dont understand this!
<br><br><br>if you can give me some help to have a nagios server with more stability. I dont know how to reproduce this bug. At the time a gap was accuring, the server time was up to date.<br><br>We have on this server more than a gap by day!
<br><br><br><br>best regards,<br>Olivier<br>