Event Profiler Timers?
Andreas Ericsson
ae at op5.se
Wed Jun 10 10:47:24 CEST 2009
Steven D. Morrey wrote:
>>> Would it be better (i.e. worth the overhead), to use clock_t =
>>> clock() to get the entry and exit?
>> Could be. But then we might as well use gettimeofday() from the
>> start so the wrap-around case at least gets exercised a lot instead
>> of every 72 minutes.
>
> I don't follow you on that, can you elaborate on what you mean by the
> wrap-around case and it being exercised, because I am not aware of it
> and therefore have not taken it into consideration. My primary
> reasons for wanting to use clock instead of gettimeofday were speed
> and precision.
>
Righto. clock(3) returns a value that represents seconds in 1/1000000
"scale". This means that, for 32-bit architectures, it will wrap around
every 71m 34.96s. Consider this:
start = clock();
do_stuff();
end = clock();
if (end < start) {
/* wrapped around */
delta = (end + (1 << (sizeof(end) * CHAR_BIT)) - start;
} else {
delta = end - start;
}
This might *look* right, but it really isn't. Some compilers can't
store the intermittent value from the first case (which *will* wrap
around once again), so it will get the result wrong, causing delta
to end up as a huge number (clock_t can't be negative) and severely
skewing the performance data. Since this will happen only for some
compilers on some archs, and only every ~72 minutes, it will be
pure hell to track down.
Otoh, it might be worth doing something like this:
clock_t start;
double delta;
start = clock();
do_stuff();
delta = clock_delta(start, clock());
static double clock_delta(clock_t start, clock_t end)
{
if (end > start) {
/* normal case */
delta = (double)(end - start);
}
if (end < start) {
/* wrap-around */
delta = (double)(end + (1 << (sizeof(end) * CHAR_BIT)));
delta -= (float)start;
}
return delta / CLOCKS_PER_SEC;
}
which should always provide the value as a fraction of seconds,
which is almost certainly how you want to display it anyways.
On a side-note, try to use arrays of doubles to store the total
delta. It's much neater than having multiple global variables.
The clock_delta function is also quite easily testable which
alleviates the worries about the wrap-around case not being tested
much.
> With gettimeofday we are talking about how much calendar time has
> been spent, whereas with clock we are talking about how many ticks
> were used by the program and then dividing that by clocks_per_second
> to get a process time. I believe that clock would be more accurate
> since it filters out the cost of other processes running on the
> system.
>
True, but the meaning of it would be hard to correlate against latency,
since some events spend an inordinate amount of time switching contexts
while others are fairly cpu-intensive but achieve a lot in wallclock
time. Parsing passive check-results is CPU-intensive, but in wallclock
time we can easily parse 50 or 100 of them in the same time it takes to
fork and fire up a plugin. I'm not sure which of the values would be
preferrable here, and I'm also not sure which of the processes gets the
CPU-time attributed to it.
> Another consideration would be the amount of overhead involved. It
> appears to me that
>
> elapsed_time = (double) (end - start) / CLOCKS_PER_SECOND; total_time
> += elapsed_time
>
But this is in-correct once every 72 minutes when end < start, as
explained above.
> Would involve significantly less over head than
>
> elapsed_time = (end.tv_usec - start.tv_usec) + ((end.tv_sec*1000)
> -(start.tv_sec*1000)); total_time += (double) (elapsed_time / 1000);
>
This is potentially incorrect for every check, but errors in this
algorithm would quite quickly be spotted, and tv_usec doesn't really
wrap; It just resets, which means we can do all the math with ints
and longs instead of off-loading it to the mcpu. Otoh, Nagios
really makes no use of the mcpu at all, so we could actually gain
performance by *not* using integer math. Some testing would be nice.
1 billion iterations should show some measurable delta, I think.
Note that this will be different per architecture, but I think it's
safe to assume that x86 is by far the most common architecture, so
whatever runs fastest on that one should probably be the one we use.
> Am I wrong or I missing something?
Only the wrap-around, but that sort of makes or breaks the algo when
using clock(3) since the wrap-around makes such an enormous difference
there.
> Thanks for the information by the
> way, it's extremely helpful.
>
Yw. Thanks for sending patches :-)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.
------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing
server and web deployment.
http://p.sf.net/sfu/businessobjects
More information about the Developers
mailing list