oscp command design and FIFO locking?
Marc Powell
marc at ena.com
Sun Sep 11 17:12:02 CEST 2005
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Fred
> Sent: Sunday, September 11, 2005 9:12 AM
> To: Nagios User
> Subject: [Nagios-users] oscp command design and FIFO locking?
>
>
> Does anyone have an idea why the oscp command (for distributed
monitoring)
> would
> kick off more then one command at a time? For example, if there are a
> number
> of checks that are completed, nagios kicks off multiple oscp scripts
> (submit
> commands).
Since the OCSP command can be and do anything, it must be run once per
check. Nagios can't predict what you're using the OCSP command for and
whether batching, as you seem to desire, would be applicable.
Distributed monitoring is just one application of OCSP. If you really
want the batching behavior, build it into your OCSP command.
> This causes the design of the submit command to need to throttle the
> access
> to whatever resources it might need to touch. If using the default
> send_nsca
> command, there can now be multiple (and many multiple) send_nsca's
kicked
> off
> and each of these on the target server will all be attempting to write
to
> the nagios FIFO. The nagios FIFO can get horribly overloaded. If the
> nagios
> master demon is not aggresively reading the FIFO
(check_command_interval=-
> 1)
> then the demons can stack up and eventually consume socket resources
and
I handle approximately 3300 passive checks every 5 minutes on somewhat
commodity hardware (quad pIII 800) using NSCA with no problems. I
anticipate that I can double and possibly triple that number as the FIFO
is empty approximately 1/3 of the time. Are you doing significantly more
passive checks than that?
> memory etc. As far as I can tell, nsca doesn't lock the FIFO, which
also
> means that writes will get intermixed with writes from plug-ins that
might
> be
> running on the master system. (I have seen this over and over)
I don't see how. Local active checks, at least the standard plugins,
don't use nagios.cmd in any way. This would also be contrary to the
blocking behavior you comment on above where your OS is essentially
'locking' the FIFO until it has been cleared. As far as your OS is
concerned, there is no distinction between NSCA trying to write to the
pipe and some other process doing the same. While others are more versed
in this than I am, it is my understanding that if the program is trying
to write more data to the pipe than it can currently hold it will be
prevented from doing so by the OS, only one process can write to the
FIFO at a time and that all writes are atomic. This presumes that the
plugin output is < the max FIFO length supported by your OS.
>
> To avoid this, I have had to implement serious locking in all plug-ins
and
> not use nsca as it has no locking mechanism (that I know of).
I'm curious about how you've done this. What exactly are you locking?
How is it helping? NSCA shouldn't need locking as it depends on your OS
to control access to the FIFO.
> Right now I am fighting with the oscp commands that can launch dozens
of
> copies at a time and each of these (in my case) write to a local file
that
> will eventually be pushed up to the master and written (while locking)
the
> nagios FIFO.
>
> So ... I guess my questions are:
>
> 1) Should nagios be forking off more then one oscp command at a time?
Yes, one per check.
> 2) Has anyone else run into FIFO corruption because of the lack of
> advisory
> locking in all the plug-ins?
Not here in almost 4 years of using Nagios/Netsaint.
--
Marc
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list