Was: Nagios arch to improve performance Re: Re: Nagios-devel digest, Vol 1 #807 - 8 msgs
Andreas Ericsson
ae at op5.se
Mon May 23 17:21:07 CEST 2005
Ben wrote:
>>
>> That said, the current bottleneck in Nagios appears to be the fact
>> that it runs checks in chunks rather than as standalone units which
>> can be picked up as they become elligible for checking. If that
>> little snag could be overcome, I'm confident that the aforementioned
>> average check latency of 25 seconds could be done away with.
>>
>
>
> This is misleading. In my experience, Nagios doesn't run checks in
> chunks. It *does* kick off as many concurrent checks as you tell it
> (assuming there are things that need to be checked), but, if the
> results come in while it's still trying to kick off more checks, it
> stops doing that so it can process the new results. Because similar
> checks tend to be started at similar times and take similarly long to
> run, that means that it *appears* as if nagios kicks of a batch checks,
> then waits a while, then kicks off some more. In actuality, it's
> processing the results of the first batch before it does anything else,
> and the batch size is defined by how long it takes from the first check
> to be started until the first result comes in.
>
> One possible way to speed this up is to trade in the rather simple
> current model of "we can't initiate checks if we've got pending
> results, because those results might alter what we need to check" for
> the much more complex (but scaleable and possibly more correct) model
> of "we can't send more checks that depend on the results of what we
> currently have outstanding checks for, but if we want to check
> unrelated services, not a problem."
>
> It seems to me that would help an awful lot, assuming it was bug- free,
> but it's also a pretty fundamental change to Nagios' scheduler.
>
Not only the scheduler, but to be implemented efficiently it requires a
fairly fundamental change in how nagios structures its memory (i.e.
checks depending on other checks must be linked to those checks).
Otherwise Nagios will just spend its time in hashfunc1() and hashfcun2()
instead, looking for services that may or may not be depending or
dependees of the elligible check.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
More information about the Developers
mailing list