Defining services at runtime

Bob Ingraham bobi at netshel.net
Thu May 11 16:08:09 CEST 2006


See below...

> Hi all,
>
> I've been researching this a little more, and I've come up with the
> following thoughts.
>
> I want the discovery process to be triggered by a configured service check
> on a switch - this is because I still want the standard Nagios scheduling
> mechanism to apply. When the check runs it will walk the switch, and
> gather
> the results, which it will then submit to my event broker module via a
> socket. The event broker module will check to see if a service exists for
> each of the switchports and if not it will create a passive service check
> for it. The event module will then submit the results for each of the
> switchports. If a switchport doesn't have any results submitted for 3
> consecutive checks then the service will be removed.
>
> So, the event broker needs to be able accomplish the following:
>
> * Create for existing services
>
> I notice that Nagios-db gets it's configuration information from the
> following callback:


Which callback were you thinking of?


> * Create a new service check
>
> nebstructs.h defines a struct "nebstruct_service_check_struct". However,
> this seems to be the only place this struct is referred to in the header
> files. How do I pass a completed struct to Nagios?
>
> It looks reality straight forward to work out how to fill out this struct,
> but "char *host_name;" could be a problem. The plugin is only going to
> know
> the host address, so I'll need a way to get a hostname from an address.

You can't add a service to Nagios using the
nebstruct_service_check_struct; in fact you can't add anything to Nagios
using any of the nebstruct_* structure.  They are one-way, informational
only - passed down to your module.  When your module returns, Nagios never
examines the structure that it passed to you for changes.

The way to add a service from within a NEB module would be to call the
internal "add_service()" function.  It's the same one that Nagios uses to
add services during configuration load.

The API for the add_service() function is:

service *add_service(char *host_name, char *description, char
*check_period, int max_attempts, int parallelize, int
accept_passive_checks, int check_interval, int retry_interval, int
notification_interval, char *notification_period, int notify_recovery, int
notify_unknown, int notify_warning, int notify_critical, int
notify_flapping, int notifications_enabled, int is_volatile, char
*event_handler, int event_handler_enabled, char *check_command, int
checks_enabled, int flap_detection_enabled, double low_flap_threshold,
double high_flap_threshold, int stalk_ok, int stalk_warning, int
stalk_unknown, int stalk_critical, int process_perfdata, int
failure_prediction_enabled, char *failure_prediction_options, int
check_freshness, int freshness_threshold, int retain_status_information,
int retain_nonstatus_information, int obsess_over_service);


However, you have a significant problem with your above strategy:

To wit, be aware that your NEB modules consist of (mostly) callback
routines, which means they only get activated when Nagios is reporting
some event for which your callback routine is registered (like a service
check, host check, external command, etc. is occurring or has just
completed.)

If your callback routine gets invoked and then starts sitting on a socket,
listening for events (and not returning control immediately to Nagios,)
then Nagios will grind to a halt because the scheduler will be waiting for
your callback to return, so *it* can return to scheduling other events and
executing them.

So, you have a timing issue:

- The scheduler gets ready to run your service check.

- Nagios invokes your check_service callback routine just *before* it runs
your service check.  Your callback routine "processes" this event and
*must* return control back to Nagios immediately.

- Your service check is executed, checks the switch, and writes the
results to a socket for the NEB module.

- However, your NEB module isn't "alive" yet because it hasn't yet been
invoked by Nagios via the service_check callback mechanism since your
service check hasn't completed and returned its results yet.  So, you
write to a socket on which no-one is listening.

- Your service check exits with its results.

- Nagios *now* invokes your callback routine with the results of your
*completed* service check.  Your callback routine process these results
and once again returns control to Nagios, so it can continue scheduling
and executing events.

One strategy might be to have your service_check either:

- Return the list of services it wants created as a result string
(although, you might not have enough space to do so on one line,) and then
have your callback routine create services based on the contents of the
results string, or

- Your service check writes it results to a temp file or IPC message queue
or some such, and your callback routine then reads this file (or from the
message queue,) and creates the services based upon that information
source.

I'm sure that there are many other ways to do this...

Anyway, anyone please correct me if I'm off on any of this, but that's how
I understand the way the callback routine mechanism works.

Regards,
Bob

> From: nagios-devel-admin at lists.sourceforge.net
> [mailto:nagios-devel-admin at lists.sourceforge.net] On Behalf Of Sam
> Stickland
> Sent: 10 May 2006 09:27
> To: nagios-devel at lists.sourceforge.net
> Subject: [Nagios-devel] Defining services at runtime
>
> Hi,
>
> Is there anyway in Nagios to define services at runtime? I'll give an
> example:
>
> You have a script that monitors ports on a switch for errors. When its run
> it walks the interface error counters on the switch. For each port it
> discovers it creates a new service entry (for this host/switch) - if one
> does not already exist - and then sets the state accordingly.
>
> It would be nice if this was possible. I realise that this could be
> achieved
> by walking the switches and then generating a static configuration, but I
> feel this method is cleaner. It combines automatic, runtime discovery of
> switchports, along with the efficiently gains of having only one service
> check ran (instead of one service check per-port). I can envisage a single
> plug-in, which only needs to be given a hostname and community string to
> able to generate individual reports for each switchport for errors,
> queuedrops and port status.
>
> Is this possible with the new Event Broker API? Is this documented
> anywhere?
> ;)
>
> S
>
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the Developers mailing list