Problems with extensive passive monitoring
Andreas Ericsson
ae at op5.se
Mon Oct 9 15:24:59 CEST 2006
Mike Becher wrote:
>
> The whole description can be read on:
> http://www.mountcup.de/tiki/tiki-index.php?page=mibe-nagios-passive-monitoring
>
[ and has thus been cut from this mail ]
> My solution
> -----------
> Instead of calling an external program (ocsp_command or ochp_command) for
> each external command message to forward it from CMNS to SMNS let write
> the nagios process these messages in a named pipe. The patch attached
> gives you this functionallity for nagios version 2.5.
>
> Then let a helper program read from this named pipe on CMNS site and let
> it forward the messages through a (I call it here) channel to whatever you
> want, in this case to SMNS. I have written a perl program that does this
> for you which is added as attachment too.
>
> What do you thing about the option to use namend pipes in addition to
> ocsp_command and/or ochp_command running as external process?
The problem with using pipes is that they normallly have a very limited
chunk of memory to use (usually only 4KB), which means that when the
combined data from all the slaves exceed this limit inside one cycle of
Nagios reaping them, you get a buildup of processes that are waiting in
spinlock for the pipe to empty so they can write to it. When the
spinlock ends, the pipe instantly fills up again because at any one time
there will always be more data waiting to be written than there is
waiting to be read.
I'm not sure your solution fixes this problem for the master nagios
server, although it will indeed provide a performance boost as it
doesn't fork() as much as the old solution. My guess is that if this
makes the problems go away, the changes in system load just allows
Nagios to keep up with the data-flow. So while being a definite
improvement, you're likely to be hit by the problem again if your
network grows, or if you get some network problem that causes Nagios to
suddenly run checks much more frequently than normally.
> The NDO interface can't be used in this case because there aren't any
> connectors inside the code for external commands.
>
Yes there is. Or rather, you don't need them as ndo-modules have direct
access to Nagios' internal API's. A much better solution would have been
to send check-result data from a module to a socket-listening module on
the master end which then uses the internal API's to update server/host
status. This would allow the bottleneck (currently the 4KiB FIFO) to be
spread over a more or less indefinite number of channels which all can
be much, much larger than 4KiB.
This is unfortunately also much more complex, as it requires mucking
about with Nagios' internals and you'd have to deal with the somewhat
tricky issue of multiplexing inside a multi-threaded application. The
fact that the module would need to operate in at least three different
modes (sending/relaying/receiving) doesn't make things easier.
Good thing winter's soon upon us, so one can get busy with interesting
things again. ;-)
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the Developers
mailing list