[PATCH] Re: alternative scheduler
Jochen Bern
Jochen.Bern at LINworks.de
Thu Dec 2 10:03:09 CET 2010
On 12/01/2010 08:55 PM, Adam Augustine wrote:
> While DNX and mod_gearman do implement that specific functionality,
> they are still subject to the scheduler/reaper bottlenecks. We (the
> institution that started the DNX project) have played around with the
> check scheduling parameters quite a bit over the years and even with
> our best scheduling parameters and DNX actually executing the plugins,
> we still see checks scheduled such that we have a large number of
> checks scheduled to execute in a single second with several seconds
> (3-5) of nothing scheduled to execute between.
Agreed. That's also the reason why I don't use either so far; I don't
have a problem (yet ...) with the short-term scheduling (scheduling "due
now" checks onto executors), but I see unnecessary churn in the mid-term
scheduling (schedule next due time of checks just completed).
Unless I *really* need new glasses, there's only three different kinds
of such rescheduling code in the 3.2.x Nagios core:
1. Reschedule *exactly* check_interval / retry_interval from last due
time (iff check_period allows this) - e.g., base/checks.c::1301ff :
if(reschedule_check==TRUE)
next_service_check=(time_t)(temp_service->last_check
+(temp_service->check_interval*interval_length));
}
2. Reschedule to the *very first second* permitted by check_period -
e.g., base/checks.c::278ff :
/* make sure we rescheduled the next service check at a valid time */
get_next_valid_time(preferred_time,
&next_valid_time,svc->check_period_ptr);
[...]
svc->next_check=next_valid_time;
3. Special (error) cases falling back to some hardcoded "check interval"
(five minutes, one week, ...).
Neither case even *looks* at the list of already-scheduled check
executions around the target time, much less does any smoothing.
(For sake of completeness: A smoothing algorithm IMHO should:
Case 1: *Decrease* next_check for at most a certain percentage of
check_interval/retry_interval, so as to avoid consecutive faults in
freshness checks and performance data processing (in the case of RRDs,
violation of xff);
Case 2: *Increase* next_check so as to stay within the check_period, but
determining a max increment which simultaneously smoothes out the
(potentially MANY) affected checks and avoids pushing the chain of
subsequent processing (retry_interval / max_check_attempts if found
non-OK, running event handlers, ...) *beyond* the valid timeframe is
definitely nontrivial.)
Kind regards,
J. Bern
--
Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel
------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev
More information about the Developers
mailing list