novel idea
Andreas Ericsson
ae at op5.se
Mon May 9 09:57:32 CEST 2005
sean finney wrote:
> hi andreas,
>
> On Sun, May 08, 2005 at 08:12:59PM -0700, nagios-devel-request at lists.sourceforge.net wrote:
>
>>From: Andreas Ericsson <ae at op5.se>
>>Subject: Re: [Nagios-devel] novel idea for performance optimization
>
>
>>Good idea, except that ld linker voodoo (symbol resolution et al)
>>induces the same or more overhead on systems with copy-on-write fork
>>(linux, bsd, solaris) and reasonably quick context-switching (linux,
>>bsd). So the suffering people are those running Nagios on HP and Cygwin.
>>Not a great many, I presume.
>
>
> no, i think you're not understanding what i'm suggesting. let me
> try and be more clear. let's use check_tcp as an example. when
> compiling check_tcp, in addition to the standalone binary, a shared
> object something like libnagios-check_tcp.so would be created.
> this shared object would have the symbol "main" renamed "check_tcp"
>
> when nagios starts up, the first time it goes to execute check_tcp (or
> even earlier, when it first reads about the check_commands), it looks
> for such a library via dlopen(). if successful, it fetches the address
> of the check_tcp function via dlsym(). from that point forward, there
> is ZERO overhead, because there's no fork/exec, nor is there any symbol
> resolution, it's just calling a function. make sense?
>
Zero overhead is just not going to happen. Nagios MUST be able to
execute checks in parallell. It can't do that if it just enters a
function instead without forking, threading or multiplexing (actually it
can't do that without forking or threading, but popen() forks, so to
multiplex the results from it would be a sort of mix of both worlds), as
that would imply a serialized execution.
> this could further be enhanced by adding multi-threading capabilities
> to such a scheme (you could have a seperate thread for each
> check_command, or perhaps some other scheme). but what's best is
> that it would involve minimal changes to the pre-existing plugins,
> and wouldn't require any significant re-designing of the nagios
> architecture.
>
It would require a huge re-design of current arch. It would also require
a huge re-design of most plugins, since they don't clean up after
themselves as it is today. They also use very shoddy function-calls. Not
to mention; plugins that crash would cause nagios to crash. This just
isn't good enough.
>
>>This is moot. All operating systems worth their salt caches frequently
>>accessed programs so the code is already in memory anyway.
>
>
> systems cache frequently accessed pages in memory, but there's still
> unavoidable overhead in creating a new process, as well as the
> context switching between the various processes.
>
This would still be unavoidable, so point is still moot (see above on
parallellism).
>
>>They would also have to add some code that splits arguments the way they
>>are supposed to, including some other additional stuff.
>
>
> isn't that already done? hmm... looking in the nagios code, it looks like
> all the plugins are called with popen[1]. so that means *two* fork/execs
> (one for /bin/sh, one for the command /bin/sh executes).
>
Three fork()'s and two execve()'s, as nagios itself forks once prior to
running popen(). execve() replaces the running process, so there's no
context-switching. It would be possible to get rid of one of the
fork()'s, but not the other two (see above on parallellism). The popen
must be there, or nagios would have to fork() explicitly and then run
the dlopen()'ed code.
> anyway, this wouldn't be very hard to do, just split the arguments on
> whitespace and call check_tcp() with what ought to have been passed to
> exec.
>
Arguments can contain whitespace if escaped or enclosed in strings. Do
you feel like writing a function that does that and that's fast enough
to run as often as is required, while still being rock-solid safe? The
functions that does this in glibc and bash are asm-enhanced and
finetuned per architecture they're run at. You'd increase load
drastically, not reduce it.
A way around this would be to rewrite the plugins more or less from
scratch, and possibly make them simpler as well, while tagging them for
nagios to KNOW which ones are expected to have modules installed. For
instance, the check_command could look something like
:PING 5 40%,100.0 60%,500.0
Having the identifier (after the : be 32 bits has some very obvious
performance benefits). Come to think of it, arguments could be separated
with ; instead of whitespace. That leaves only the exception of one
escaped char, which is a good thing.
>
> sean
>
> [1] any reason it's being done this way and not with fork/exec/dup?
popen() is fork() + dup() + execve(), more or less. Read glibc-2.3.5
libio/iopopen.c, especially the _IO_new_proc_open function (popen, but
glibc internal).
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.
Get your fingers limbered up and give it your best shot. 4 great events, 4
opportunities to win big! Highest score wins.NEC IT Guy Games. Play to
win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20
More information about the Developers
mailing list