scheduling queue and check interval
Rob Brown
dtownrobbrown at gmail.com
Wed Jun 20 01:51:13 CEST 2007
Forgive me if this has been covered previously: I am not very active
on the lists. I found some posts about this subject a long time ago in
v1 but am wondering if is this still an issue or is something that can
be resolved.
I run a pretty decent sized Nagios config (version 2.2 672/3612
hosts/services) and have been struggling lately to understand why my
latency is so high (0.08 / 2310.79 / 1451.444 sec) and my scheduling
queue is about 30 minutes behind schedule. It's probably because I
have mostly active checks, many of which are nrpe checks and take a
few seconds each. It takes about 5 hours for Nagios to catch up on the
scheduling queue after a restart. I've read over the documentation
dozens of times and think I understand the basic scheduling logic. I
have toyed with all the available options, but one thing became
obvious to me that seemed to be disregarded, and that is the check
interval. When the checks get scheduled, they start alphabetically
based on hostname, and get "interleaved" based on the interleave
factor if that option is turned on.
Now, suppose I have 260 hosts named An thru Zn, and most of the hosts
run a slew of checks that are slow and only scheduled to check once an
hour. However, hosts Xn, Yn and Zn are critical servers that have
checks that are supposed to run every 2 minutes. Also, suppose for now
interleaving is turned off. When Nagios starts up, it schedules the
checks without regard to the normal_check_interval. This means the
checks for hosts XYZ have to wait till A-W get processed, and may not
get scheduled for (as in my case) a long time, missing their 2 minute
window. Of course, turning on interleaving can alleviate SOME of this,
but that seems hit and miss depending on the alphabetical placement of
your critical hosts, and as you can imagine, if you multiply the
numbers, the problem gets worse.
It seems in this scenario it would make sense to have a configuration
option available that would allow you to initially schedule the
highest priority checks first (those with the lowest
normal_check_interval) so that they can finish and get rescheduled
right away. Another thought would be to use an external script to
parse the config and sort the checks by check interval then manipulate
the scheduling queue.
I would be interested to hear what others are doing to overcome this.
I don't want to bother the group with the details of tuning my config,
more so discuss the theory of this type of scheduling logic.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list