Bug in Performance Data
Lawrence Findley
larryfindley at yahoo.com
Fri Aug 6 00:45:17 CEST 2010
Thank you, Ethan for your response.
The CGI reads status.dat.
Here are some lines from one of the blocks:
servicestatus {
host_name=wtf3a
service_description=check_ses
modified_attributes=0
check_command=check_ddnfaults_ses
check_period=24x7
notification_period=24x7
check_interval=5.000000
retry_interval=1.000000
event_handler=
has_been_checked=1
should_be_scheduled=1
check_execution_time=46912587.078
check_latency=0.399
check_type=0
current_state=0
last_hard_state=0
last_event_id=87848
current_event_id=87862
current_problem_id=0
last_problem_id=43437
current_attempt=1
max_attempts=3
state_type=1
last_state_change=1280725323
last_hard_state_change=1280082023
last_time_ok=1281047224
last_time_warning=0
last_time_unknown=1280725253
last_time_critical=1280081723
plugin_output=CHECK_DDN_ENCLOSURE OK - No errors were found.
***
Please notice the check_execution time as more than a year in seconds.
I don't see anything time-change related in the logs. After I filter out
host/service alerts/notifications, nothing but auto-saves and start/stop
information remain as follows:
[1281028702] Auto-save of retention data completed successfully.
[1281028706] Caught SIGTERM, shutting down...
[1281028706] Successfully shutdown... (PID=6509)
[1281028706] Event broker module '/usr/local/nagios/modules/dnxServer.so'
deinitialized successfully.
[1281028727] Nagios 3.2.1 starting... (PID=28820)
[1281028727] Local time is Thu Aug 05 10:18:47 PDT 2010
[1281028727] LOG VERSION: 2.0
[1281028727] Event broker module '/usr/local/nagios/modules/dnxServer.so'
initialized successfully.
[1281028728] Finished daemonizing... (New PID=28821)
[1281028763] EXTERNAL COMMAND:
SCHEDULE_FORCED_SVC_CHECK;nagios06;check_app_java_cluster;1281028760
[1281029029] Auto-save of retention data completed successfully.
Here is the detail from the
https://nagios06.internal.shutterfly.com/nagios/cgi-bin/extinfo.cgi?type=4
Metric
Min.
Max.
Average
Check Execution Time: 0.00 sec 46912714.32 sec 26955114.505 sec
Check Latency: 0.00 sec 3.40 sec 0.277 sec
Percent State Change: 0.00% 37.43% 0.33%
If I stop Nagios and remove retention.dat and status.dat and restart fresh,
Nagios looks normal for about 2 minutes and then reports the 1.5 year execution
time.
Any idea on how to investigate and fix this bug?
Thank you!
-Larry Findley
Sr. Systems Engineer
Shutterfly
lfindley at shutterfly.com
________________________________
From: Ethan Galstad <egalstad at nagios.org>
To: Nagios Developers List <nagios-devel at lists.sourceforge.net>
Sent: Wed, August 4, 2010 6:22:10 PM
Subject: Re: [Nagios-devel] Bug in Performance Data
Are there any message in the Nagios log file that relate to detected
time changes?
The (stated) execution time for these checks is approx 542 days, which
is strange. Most time issues would show just a few hours offset, not
almost 2 years time.
What times are reflected in the status.dat file? Are you sure your CGI
script is reading/processing the correct values from that file?
- Ethan Galstad
Lawrence Findley wrote:
> Hello Folks,
> This info is from a cgi script that we use to show execution times.
> These are obviously incorrect. None of the checks actually take more
> than a few seconds to complete.
> Any ideas? Thank you.
> -Larry Findley
>
>
>
>
> Wed Aug 4 17:00:01 PDT 2010
>
>
> Top 10 Service Check Execution Times
>
> HOST SERVICE TIME
> im477 check_rdf_content 46912687.968
> vividpics104 check_lab_min_procs 46912684.576
> vividpics158b check-win-mem 46912683.695
> vividpics110e check-win-cpu 46912683.695
> grf133 check_all_local_disk 46912683.695
> vividpics147c check-win-mem 46912683.695
> vividpics162e check-win-disk 46912683.695
> vividpics156d check-win-disk 46912683.576
> vividpics144e check-win-disk 46912683.576
> vividpics165b check-win-mem 46912683.576
>
>
> ------------------------------------------------------------------------
> *From:* Lawrence Findley <larryfindley at yahoo.com>
> *To:* Nagios Developers List <nagios-devel at lists.sourceforge.net>
> *Sent:* Wed, August 4, 2010 4:35:20 PM
> *Subject:* Re: [Nagios-devel] Bug in Performance Data
>
> Yes, Benny,
> we run ntp to keep everything correct.
> We use 4 satellites with DNX and I also verified that the time is
> correct on all of the satellites too.
>
> Wed Aug 4 16:26:13 PDT 2010
> nagios at nagios06 <mailto:nagios at nagios06> /usr/local/nagios/etc/objects $
> The Nagios server is not a VM.
>
> Thank you for taking a look at this!
> -Larry Findley
>
> Shutterfly
>
>
>
>
> ------------------------------------------------------------------------
> *From:* C. Bensend <benny at bennyvision.com>
> *To:* nagios-devel at lists.sourceforge.net
> *Sent:* Wed, August 4, 2010 12:15:27 PM
> *Subject:* Re: [Nagios-devel] Bug in Performance Data
>
>
> Is the time synced properly on your Nagios host?
>
> Is this a VM?
>
> Benny
>
>
> > I found a bug where the performance stats do not reflect execution times
> > accurately.
> >
> > version 3.2.1 shows execution times to be millions of seconds.
> > Any ideas?
> > Thank you.
> > -Larry Findley
> >
> > Monitoring Performance
> > Service Check Execution Time: 0.00 / 46912714.32 / 30610220.908 sec
> > Service Check Latency: 0.00 / 3.40 / 0.242 sec
> > Host Check Execution Time: 0.01 / 8.02 / 1.077 sec
> > Host Check Latency: 0.00 / 1.13 / 0.403 sec
> > # Active Host / Service Checks: 2159 / 18342
> > # Passive Host / Service Checks: 0 / 0
> >
> >
> > ________________________________
> > From: Lawrence Findley <larryfindley at yahoo.com
> <mailto:larryfindley at yahoo.com>>
> > To: Nagios Developers List <nagios-devel at lists.sourceforge.net
> <mailto:nagios-devel at lists.sourceforge.net>>
> > Sent: Tue, August 3, 2010 12:42:12 PM
> > Subject: [Nagios-devel] 1.5 year execution time?
> >
> >
> > Here is the text:
> >
> >
> > Monitoring Performance
> > Service Check Execution Time: 0.00 / 46912714.32 / 30610220.908 sec
> > Service Check Latency: 0.00 / 3.40 / 0.242 sec
> > Host Check Execution Time: 0.01 / 8.02 / 1.077 sec
> > Host Check Latency: 0.00 / 1.13 / 0.403 sec
> > # Active Host / Service Checks: 2159 / 18342
> > # Passive Host / Service Checks: 0 / 0
> > I have tried removing status.dat and retention files. Within a few
> > minutes, it
> > goes back to these numbers.
> >
> > Anyone with idea?
> > Thank you.
> > -Larry Findley
> >
> >
> >
> ------------------------------------------------------------------------------
> > The Palm PDK Hot Apps Program offers developers who use the
> > Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> > of $1 Million in cash or HP Products. Visit us here for more details:
> >
>
http://p.sf.net/sfu/dev2dev-palm_______________________________________________
> > Nagios-devel mailing list
> > Nagios-devel at lists.sourceforge.net
> <mailto:Nagios-devel at lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/nagios-devel
> >
>
>
> --
> "Something's going on in this house - last night, I saw a face!"
> "Did it have a nose?"
> "Yes!"
> "That sounds like a face all right."
> -- Scary Movie 4
>
------------------------------------------------------------------------------
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100805/5990b8e4/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by
Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel
More information about the Developers
mailing list