[naemon-dev] Dynamic Object Creation / Group Alerts
Andreas Ericsson
ageric79 at gmail.com
Mon May 18 12:56:39 CEST 2015
On 2015-05-10 00:55, Lee Wilson wrote:
> Evening All, I was wondering if any more work is being done on
> dynamic service creation and possible more advanced alerting?
Yes. The dev-2.0 branch contains the start of it, in splitting out
creation and registration of objects to separate functions.
The main headache is that modules need to be adapted to handle the
object configuration changing after they're loaded. Currently,
livestatus and merlin will both crash when objects are either added
or removed.
> The
> areas of auto-configuration and gathering of status from compound
> services has always been a weakness of Nagios and certainly prevented
> me getting it adopted by less willing employers. I'm very keep to see
> the core of Naemon s kept as minimal as possible but to provide some
> of these features, perhaps a cooperating addon would help. I've been
> thinking that having something like an inventory service for solving
> the problem of interfaces or other dynamic services. This starts of
> as a parent service that Naemon is aware of. When it's run it runs as
> a normal service check that collects all the data and reports back
> success/failure. In addition it talks to an additional daemon that
> processes the inventory (maybe in the form of XML/JSON data) and
> creates new services which would then be checked by Naemon as
> normal. I doubt all the info could be contained in the perfdata
> output and it probably shouldn't either, than can be left for more
> summary/basic info (such as time to run check, number of interfaces
> found, etc). These dynamic services wouldn't be deleted
> automatically by default as that is upto the administrator. Biggest
> issue I see with this is having to get the plugins rewritten to
> handle it and also needing to have a server side element (even if
> just some kind of parser script) to the plugin to process the data.
> Being a network engineer, I do tend to focus on the likes of
> interfaces (especially switches as they rather long winded to add)
> etc it equally could apply for enumerating windows services, load
> balancer pools, etc. The service checks could also be configured with
> filtering capabilities (such as exclude 'this', only include starting
> with 'x', etc).
> For dealing with compound services (such as 3 out of
> 5 HTTP services have failed in the last hour), this would probably
> need something that can process the recent service check output and
> notice patterns. In traditional Nagios this could use NDO and have
> the addon read the DB data every x minutes but I guess MKLiveStatus
> could do the same thing.
Better solution; A module adds a pseudo-service for the cluster and
updates the cluster health based on the number of elements in a good
or bad state in the cluster. It could also mark the cluster services
with custom variables and a clever UI could then use that information
to display it in a way that makes sense. A tree structure would make
a lot of sense, but UI's aren't really my strong side.
> For each compound alert that is created
> potentially a service check is created allowing it be ran on schedule
> but I'd be concerned this would put unnecessary load on the main
> naemon process, a seperate addon could even be run on an entirely
> different server if MkLiveStatus is available over the network. My
> last 2 employers have wanted features like this (especially the last
> one) so unfortunately I've never been able to get them to adopt
> Nagios/Naemon but I keep trying. Am I correct in saying that it's not
> possible to alert based on the status of a host/service group and
> they are mainly just for display purposes? This is just my early
> brain dump on the idea without needing to change any of the core
> Naemon functionality. Would be interested in any feedback. Lee
>
It's entirely possible to write a module that tracks the state of the
elements in a host- or servicegroup and alerts based on that, and it
wouldn't be very difficult. Currently, it should update the status of a
host or service that isn't being actively checked, but automagically
adding one seems a lot cleaner to me.
/Andreas
More information about the Naemon-dev
mailing list