[PATCH] Re: alternative scheduler

Fredrik Thulin ft at it.su.se
Wed Dec 1 15:40:36 CET 2010


On Wed, 2010-12-01 at 15:14 +0100, Andreas Ericsson wrote:
...
> > Host checks were still being scheduled, and every time a host check was
> > found at the front of event_list_low, Nagios would log "We're not
> > executing host checks right now, so we'll skip this event." and then
> > sleep for sleep_time seconds (0.25 was my setting, based on (Ubuntu)
> > defaults) (!!!).
>  
> 
> This should only happen if you've set a check_interval for hosts but
> have disabled them globally, either via nagios.cfg or via an external
> command. It seems weird that we run usleep() instead of just issuing
> a sched_yield() or something though, which would be a virtual noop
> unless other processes are waiting to run.

Guilty of setting a check_interval for hosts, even on slave servers,
yes.

IMNSHO, if that is an unsupported configuration in combination with
execute_host_checks=0, Nagios should refuse to load the configuration. 

> > I made the attached minimalistic patch to not sleep if the next event in
> > the event list is already due.
> > 
> 
> Seems sensible, but I think it can be improved, such as issuing either
> a sched_yield() or, if sched_yield() is not available, running usleep(10)
> every 100 skipped items or so. That would avoid pinning the cpu but would
> still be a lot faster than what we have today.

What is sched_yield? I can't find that function anywhere in the source
code. Feel free to improve the patch - as I've previously said C isn't
my game.

> > This removed the total lack of performance in my installation, but
> > service reaping is still killing me slowly on my virtual development
> > server.
> 
> How come?

I currently reap every 10 seconds, and crude empirical observations made
by tailing the log file says that reaping takes 3-4 seconds on my
virtual machine (< 1 second on the production server). This is *with*
the following things on RAM disk :

  object_cache_file
  precached_object_file
  status_file
  temp_file
  temp_path
  check_result_path
  state_retention_file
  debug_file

and with the tiniest C program that appends results to a file as
ocsp_command.

I'll try changing reaping interval to every 2 seconds as per your
advice, but I guess it will still take 30-40% of the total time. 

> ... Still though, reaping more frequently means the cache
> would more often be hot and reaping will run a lot faster.

Which cache would be hotter by reaping more frequently do you mean? The
files are on RAM disk already.

> > The scheduler really needs much more work (like sub-second precision for
> > when to start checks - that gave me roughly 25% additional performance
> > in my Erlang based scheduler),
> 
> That's not possible. With subsecond precision the program has to do
> more work, not less. You're looking at the wrong bottleneck here and
> you most certainly botched the implementation the first time around if
> adding subsecond precision made such a large improvement for you.

We should have a beer and talk about scheduling sometime, since we're
both in Stockholm (?).

My first scheduler ticked once per second and *BAM* started 30+ checks.

A lot of the times, a significant number of these checks were exactly
the same check (but different target hosts), so my theory is they all
requested the very same resources around the same millisecond. When I
changed the scheduler to start one check every 50 ms instead, I saw that
I could start around 25% more checks every second. Other theories are
welcome, but that was my observation.

> Try removing check_interval and retry_interval from your hosts instead,
> and set should_be_scheduled=0 in your retention file before restarting.
> execute_host_checks is about actually running the checks, whereas you
> want to skip even scheduling them.

I'll think about doing that, or just throwing hardware at the problem
now that my Nagios check servers perform reasonably well.

/Fredrik



------------------------------------------------------------------------------
Increase Visibility of Your 3D Game App & Earn a Chance To Win $500!
Tap into the largest installed PC base & get more eyes on your game by
optimizing for Intel(R) Graphics Technology. Get started today with the
Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs.
http://p.sf.net/sfu/intelisp-dev2dev




More information about the Developers mailing list