More performance optimizations for 3.2.0, this time optimizing startup time

Jonathan Kamens jkamens at advent.com
Thu Jan 28 11:45:42 CET 2010


On 01/28/2010 03:49 AM, Ton Voon wrote:
> I'm wondering if there is a better way of doing this. Rather than each 
> caller having to work out the next comment id value and making sure it 
> is still unique, I think the next_comment_id should be manipulated 
> within xrddefault.c instead. If you call add_new_comment with a 
> comment_id of 0, it uses next_comment_id and ensures it gets incremented.
The right place to control the uniqueness of comment IDs is at the layer 
where comments are abstracted, i.e., comments.c.

Xrddefault.c is the wrong place to do this because xrddefault.c is only 
one of several possible comment storage layers.  Furthermore, it is not 
the only place where comments are added; there's also xsddefault.c.

The API exported by comments.c needs to represent and support the fact 
that there are actually two different comment-creation cases -- adding a 
comment with a known comment ID (i.e., read in from previous state), or 
adding a newly created comment that needs to be assigned a comment ID.  
The code in comments.c should do the assigning in that case and return 
the new comment ID to the caller.

The API syntax for distinguishing these two cases could be as simple as 
always passing in a pointer to the comment ID, where the value in said 
pointer is 0 if comments.c should assign the next ID, or non-zero if it 
is already assigned and should not be changed by the caller.

However, the disadvantage of that simple syntax is that it will allow 
bugs to creep into the code, the most likely one being a situation in 
which the caller forgets to initialize the comment_id value to 0 before 
calling the comment add API, thus causing a comment to be created with a 
previously used comment ID.  To avoid this, either separate entry 
point(s) should be provided for creating comments with predefined 
comments ID, or there should be a flag in comments.c that the caller has 
to be set which indicates whether the operation currently in progress is 
reading in old state or generating new state; when the former is in 
progress, the comment ID must always be specified, whereas when the 
latter is in progress, it can never be.

This will get a little complicated, e.g., when generating flapping 
comments dynamically when reading in retention.dat.  You'd have code 
that looks like:

    ...
    set_reuse_comment_ids();
    ... code to read and parse retention.dat ...
    if (... existing logic for determining flapping ...) {
       set_new_comment_ids();
       ... create new comment about the flapping ...
       set_reuse_comment_ids();
    }
    ... finish reading and parsing retention.dat ...
    set_new_comment_ids();
    ...

All of this is the "right" way to fix the brokenness in this code, but I 
was trying to make my changes as minimal as possible to make it easier 
to verify their correctness and therefore more likely that someone would 
be able to review and commit them.  Furthermore, the changes I've 
described above are somewhat architectural in nature, and I don't feel 
that I'm an experienced enough Nagios developer to be in a position to 
make architectural changes.  And finally, I do have a job I'm supposed 
to be doing instead of hacking on Nagios :-), and I can't really justify 
spending more time on the cleaner implementation described above.
> I guess on the initial load of retention.dat you have to validate that 
> the next_comment_id is unique,
No, you don't, really.  Retention.dat is not supposed to be edited by 
the user.  It should be safe to assume that when it was saved, the 
comment IDs saved to it were unique.  If someone does need to edit it 
for some reason, then it is their responsibility to preserve the 
uniqueness of comment IDs when doing so.  Considering how large 
retention.dat can get (ours is 27MB) and how expensive it is to read and 
parse it, this is a reasonable optimization which decreases the time 
taken to read and process the file.
> Can you extend the tests in t-tap/ to time the performance increase?
>
> So I'm thinking that the patch needs a bit more work to be applied.
You may choose to accept the patch as is or not, but I will point out 
that although I clearly agree with you that there's a better way to 
optimize this code, the changes I made certainly do not make the code 
any worse than is now, and I would therefore argue that my changes are a 
net gain regardless of whether the other changes described above are 
implemented.

As for the testing, my goal here was to solve a specific problem for my 
organization, and to contribute the work I did on that solution (finding 
the bottleneck and optimizing it) back to the Nagios development team.  
As noted above, this is not really how I'm supposed to be spending my 
time at work, so I'd prefer if someone else added the test case.

Thanks,

   jik

--
Jonathan Kamens
Operations Manager

Advent Tamale RMS <http://www.advent.com/solutions/by-product/tamale-rms>

201 South Street, Suite 300, Boston, MA  02111
Phone: +1 617 261 0264 ext. 133 | Mobile : +1 617 417 8989 | 
Fax: + 1 617 812 0330
jkamens at advent.com <mailto:jkamens at advent.com> | www.advent.com 
<http://www.advent.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100128/5654ab6d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tamale.jpg
Type: image/jpeg
Size: 6255 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100128/5654ab6d/attachment.jpg>
-------------- next part --------------
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list