Slow Nagios reloads with NDOUtils
Ton Voon
ton.voon at altinity.com
Sat Nov 17 09:51:33 CET 2007
On 16 Nov 2007, at 21:23, mark.potter at academy.com wrote:
> My first problem, and I am not sure it is actually a problem, is
> that when I do a reload of nagios (/etc/init.d/nagios reload) it
> takes, what seems to me to be, a long time. It is usually around
> 90-120 seconds for Nagios to start allowing use of the web
> interface once the reload is initiated. A check of the files
> reveals no errors (save one warning for a host with no services)
> and the nagios process shows in a ps awux list. However the web
> interface shows the "Whoops! Error: Could not read host and service
> status information!" during the 90-120 second delay I mentioned
> earlier.
>
Hi Mark,
There seem to be quite a few emails in this list about NDOUtils being
a bit slow. We saw this about 6 months ago and have been optimising
the hell out of it, but it boils down to this:
- NDO updates are synchronously applied to the database
This means that Nagios has to wait for the DB to finish the update
before it continues. I believe Ethan is doing something at NDO after
Nagios 3 is released.
We've done various tricks to try and reduce the time for a reload,
which we will blog about on http://altinity.org soon, but I just
haven't found the time to do it. The first couple of things that come
to mind are:
- indexes should be re-arranged so that the time column is first.
Currently, a lot of indexes have instance_id first. However, when you
are doing a delete based on time, the index is effectively useless,
so mysql has to do a complete table scan to work out which rows need
to be deleted. This will cause mysql to take a lot of time. This is
the single biggest thing that you can do
- reduce the amount of times ndo2db calls the housekeeping routine.
By default, it is every 60 seconds. We've reduced down to 600
seconds. It could probably be even less frequent. One thing I've just
thought is to have ndo2db NOT do any housekeeping and do it yourself
(mysql is multi-user after all)
- reduce the amount of data sent. We stop the broker module sending
systemcommands, log entries and passive commands
- we've also patched Nagios to not send status data on a reload. By
default, Nagios will send data to ndo about the status of all hosts/
services on a reload. This is not required because the db already
knows what the status of the things were before the reload!
- we're currently testing a de-coupling of NDOMOD from ndo2db. The
idea is that NDOMOD writes files and then a separate daemon loads
those files into ndo2db. This effectively means that NDO updates are
now asynchronous, though there is now a delay in the updates
We've also made a patch to Nagios 2.9 (which Ethan has applied to
Nagios 3), where the status file is kept between reloads, so you
don't get the dreaded "Could not read host and service status
information" error. That is available at http://altinity.blogs.com/
dotorg/2007/09/nagios-patch-da.html.
We love NDOutils - a lot of our features in Opsview depends on it,
including our favourite, Hostgroup Hierarchy (http://opsview.org/
hostgrouphierarchy). So we're interested in making NDOutils work as
fast as possible too.
Ton
http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list