[Nagios-devel] new plugin interface for Nagios
Andreas Ericsson
ae at op5.se
Fri May 7 16:10:02 CEST 2004
Deomid Ryabkov wrote:
> AE> Deomid Ryabkov wrote:
>>>
>>>well, basically I think it's just about time to add a new plugin interaction interface to Nagios.
>>>pretty bold, ha? ;)
>>>
>>>currently we have 248 hosts monitored with 755 active checks at a 60 seconds interval.
>>>(interval_length=10, normal_check_interval 6)
>>>
>>>being in charge of the monitoring, by now i have done all i could to optimize plugins,
>>>and in fact this has helped a lot to keep the system running at a decent pace.
>>>(for example, i have integrated disk checks into one plugin that uses shared snmplib
>>>instead of calling snmpget, effectively elimitaing another fork)
I'd like to see some of these plugins, if you don't mind. New plugins
are always interesting.
>
> AE> That problem will still exist, unless you mean to make the code
> AE> thread-safe, which would make nagios a memory-hog on large systems (a
> AE> lot more hash buckets would be required for this to work). Besides, on
> AE> linux-systems, fork() uses copy-on-write, so only the PTE needs be created.
>
> well, now it takes fork() + exec() to complete a check. and my aim is that latter exec().
> that doesn't make nagios threaded.
>
Are you meaning to remove the fork()? If so, how do you suspect nagios
to run several checks at once? Your 755 seconds would (at best) take 400
seconds to complete without some sort of parallellization.
Or are you meaning to remove the exec()? That can't be done without
removing the fork() as well, and then we're back with the bloated memory
hog nagios isn't today (at least not without database support and other
bling-bling).
>
>>>so now i'm thinking of adding some kind of plugin invocation mechanism into Nagios
>>>that wouldn't require starting up another program.
>>>and what i am thinking of as my options are:
>>>
>>>1) shared library mechanism, like Apache modules. should be the fastest of all, but has its shortcomings.
>>>not very flexible.
>
>
> AE> Not a bad idea, but nagios would still have to fork() or
> AE> pthread_create() to actually RUN the different checks (unless you want
> AE> it to serialize checks, which is just plain dumb).
>
> basically, i don't mind nagios to fork (yet), but instead of running an external plugin it should...
> well, that is to be decided ;)
>
External plugins is the foremost power of nagios. If everybody would
have to write C modules (like for apache), only very few people would be
competent enough to manage that, and we would take a wide step back in
nagios' evolution before we managed to rewrite all the perl and sh
scripts as modules to nagios (not to mention the code in nagios itself).
> as of now, for every check a separate process is launched. arguments are parsed, snmp session
> is created and initialized, host's filesystems are enumerated, their current state is recorded,
> warning threshold value is obtained (for unix hosts).
> then a match of fs data against thresholds is done with most severe condition becoming exitcode.
> summary is printed and there we go, check done.
> and we do this for more that 200 hosts, every minute (we are leaving the check interval out of our discussion for now).
> for me, it seems obvious that this could be optimized. only if we hadn't to start all over every time.
Even if you implemented the code for every check in nagios, all you'd
save would be the exec() call. This at the price of stripping nagios of
its most powerful feature; flexibility beyond belief.
> most of the data is the same all the time, so why not to just cache it?
Because you need it fresh to be valuable. This would work very nicely
for stateful tcp connections, but how does one go about checking
web-pages? HTTP is a stateless protocol, and I don't suspect the world
to change that because someone wants to write a program that already
exists and work with current standards.
> i could write a check_disk_snmpd, that'd create and initialize an snmp session, cache filesystem data
> and thresholds and only do a couple of get()'s upon a request arriving from nagios to freshen the data.
Parsing arguments and setting threshold values are done in a blink and
requires little if any CPU power. Obtaining a socket and requesting a
connection is also very light on both CPU and memory. If you want to
optimize something, work on things that need it.
> seems pretty obvious for me indeed.
>
> so, what is to be done?
> basically, we have to teach nagios to open a socket (or sould it be other IPC mechanism? may be a message queue? I'm still unsure)
> send it a request packet and settle down waiting for a reply.
In a fork()ed process, I'm sure you mean. If it's just waiting, it can't
do something else, which means 10 seconds of doing nothing while a
socket times out because a webserver is down, and another 10 just to
determine that the server actually IS down, and not just IIS fucking up
again.
> the daemon on the other side could be threaded (i think i'd write mine this way), but it doesn't in fact matter.
It makes a huge difference, actually. You can't make a program do two
things at once without threading one way or another.
> with socket we could even go as far as running this daemon on remote machine,
> but the benefit of this is unclear to me.
>
Three reasons, all highly thought of;
Load balancing, redundancy, configuration propagation.
> that is it. what do you think?
>
I think you should study C some more.
> --
> Best regards,
> Deomid Ryabkov
> UNIX Systems Administrator
> RosBusinessConsulting | http://www.rbc.ru/
> E-mail: rojer at rbc.ru | ICQ: 8025844
--
Mvh / Best Regards
Sourcerer / Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list