Future of Nagios (was Nagios is dead! Long live Icinga!)

Andreas Ericsson exon at op5.com
Wed May 6 22:47:35 CEST 2009


Mathieu Gagné wrote:
> Hi Ethan,
> 
> First, thank you very much for Nagios.
> 
> Our enterprise relies heavily on it and Nagios has been a great 
> monitoring tools for us for so many years. Up to now, nothing has 
> surpassed its simplicity of use and we will continue to use it in the 
> foreseeable future.
> 
> On 5/6/09 11:56 AM, Ethan Galstad wrote:
>> 4. Big things are coming around the bend for Nagios.  Big things take
>> time.  Be patient for a bit longer and you'll see the results.
> 
> As an enterprise looking to scale Nagios to tens of thousands monitored 
> hosts and services, what could be our expectations of the future 
> regarding scalability?
> 

I should think some sort of event-transport module integrating tightly
with the user interface will handle this. Fortunately, we're working on
exactly such a solution. The event-transport module is reasonably stable
and the gui is well under way. Check out www.op5.org, and particularly
merlin and ninja (merlin will be merged with reports-module and
reports-gui will be merged into ninja in the near future).

> We are using NDOutils to centralize host/service status.
> 
> One of our main challenge will be to optimize the configuration and 
> patch Nagios/NDOutils to make reloads as fast as possible since addition 
> and removable of monitored hosts have a high turnover rate. (I don't 
> know if it's the correct way to say it in English)
> 

Merlin doesn't have this problem, as it works differently with its
database.

> Reloading Nagios so it can pickup the new configuration is viewed as a 
> "flaw" by our developers team because there's no monitoring done during 
> that time.
> 

Well, restarting or just reloading the configuration doesn't really make
a difference to what kind of monitoring is happening during the reload.
Even if Nagios were to reload the configuration without requiring a
restart, no network monitoring would happen during the reloading.

> If we reload Nagios too often, it would simply pass the majority of its 
> time exporting configuration/status to NDOutils and scheduling checks 
> without doing any real work at all. Too seldom and new monitoring would 
> take too much time before being scheduled.
> 
> Any future plan regarding this aspect?
> 

Well, I've experimented a little bit. It seems to be several orders of
magnitude faster to do the configuration parsing in two passes. One to
find out how many objects there are of each type and sort them into a
two-dimensional table of and then doing a binary search on that table,
as opposed to creating fixed-sized hash tables and pre-insert objects
into it. This is especially true for huge configurations, and appears
to be caused by far more beneficial memory access patterns and the
ability to only parse most objects a single time since we know that
all hosts have been parsed by the time services are parsed, fe.

> Also, have you ever heard of DNX? http://dnx.sourceforge.net/
> Any future plan about a similar feature within Nagios?
> 

DNX is an event-broker module. The Nagios core has been modified to
accommodate modules of that kind, but the actual functionality is of
the kind that the eventbroker api was designed for, so it's not likely
that such functionality will be brought into the nagios core.

/Andreas

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com




More information about the Developers mailing list