<br>Like its been noted this is the way pasive checks are done. I've seen installations handling 20,000+ services with remote servers baically doing checks on standard schedule and passing results back to nagios. But to be honest I like DNX more exactly because it lets nagios schedule things and then pass results to remote server - in config you preset which checks need to be done often and which can wait and be done say once/hour. Otherwise you end up with no standard config system for how often results should be checked. What makes sense though is for clustering system like DNX to read data on how often checks are to be done and then "cache" results in its own databae and on remote hosts executing these checks and report to nagios that it need not schedule this specific check any more and the remote server doing checks would handle scheduling on its own based on known nagios setting. Basically way of turning passive on/off on the fly, which in fact can be done jut like that, but the only issue I don't want nagios to permanantly change to passive and only set passive temporarily from the time it has started until it restarts or rereads config or receives notification from DNX that it stopped doing the checks.<br>
<br><div class="gmail_quote">On Fri, Sep 25, 2009 at 11:13 AM, Steven D. Morrey <span dir="ltr"><<a href="mailto:smorrey@ldschurch.org">smorrey@ldschurch.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
It's very similar.<br>
I call it semi-passive (although someone mentioned passive-aggressive might be a better name for it).<br>
You still have an active nagios instance and it's still checking to make sure checks did execute on time (similar to active), it's just not doing the actual execution anymore (similar to passive), and instead of processing the "meaning" of the results of a check it would just process the outcome as directed by the rules for the host/service being monitored.<br>
<br>
Let me give a for instance...<br>
Under my current setup I dispatch a check to a DNX worker node, the check executes and the result is handed wholesale back to Nagios.<br>
Nagios parses the result, and tries to divine if the service is up, down, flapping etc and then takes appropriate action.<br>
Here's a breakdown of where time is spent.<br>
<br>
Nagios event loop approx 0.07s handing service check to DNX<br>
DNX average of 3 seconds round trip<br>
Nagios up to 10 seconds to process the result depending on how many dependencies are involved, and as much as 30 seconds if a host check is required.<br>
<br>
Now obviously this is because all of my service checks are active and not passive and I have 3,000 hosts and 30,000 service checks<br>
<br>
Under the proposed design it would look more like this.<br>
<br>
Nagios initializes and pushes all schedule pieces to all hosts.<br>
Next nagios enters a passive mode where it listens for results, and audit mode where it watches the schedule looking for results that haven't come in yet.<br>
On the flip side the execution daemon is running on each host and it's executing the checks, determining what is meant by the check "service up/down flapping etc" and passes that meaning back to nagios which subsequently takes the appropriate action.<br>
All the while the auditor is watching for checks that were scheduled but haven't come in yet, and contacting hosts to find out whats up etc.<br>
<br>
So really in some ways this is an expansion of the current passive model for checks, but in some ways this is a whole new model (compared to what we do now anyways)<br>
<br>
Those are my thoughts on the matter, what do you think?<br>
<br>
Sincerely,<br>
Steve<br>
<br>
________________________________________<br>
From: hemebond [<a href="mailto:hemebond@gmail.com">hemebond@gmail.com</a>]<br>
Sent: Friday, September 25, 2009 2:19 AM<br>
To: Nagios Developers List<br>
Subject: Re: [Nagios-devel] A different way?<br>
<div class="im"><br>
Isn't this the same as using passive checks? It sounds like what I've set up. I wrote a simple agent (script) that has its own schedule and runs the checks, sending the result back to a Nagios server.<br>
<br>
</div>2009/9/25 Steven D. Morrey <<a href="mailto:smorrey@ldschurch.org">smorrey@ldschurch.org</a><mailto:<a href="mailto:smorrey@ldschurch.org">smorrey@ldschurch.org</a>>><br>
<div><div></div><div class="h5">Hello everyone,<br>
<br>
I've decided to take a break for a bit from multi-threading nagios to focus on DNX since that is my day job after all :)<br>
While working on all of this I had a few thoughts that might make some good ideas if Nagios is ever re-designed again, say for a 4.x branch.<br>
<br>
As you know, under nagios, all checks are dispatched by nagios to be executed on the local machine at set intervals.<br>
Under a distributed nagios setup, you have multiple nagios instances running on various machines executing checks and passing the results back to a passive master controller.<br>
<br>
Under DNX, we distribute the load to "worker nodes" which then execute the checks and hand the results back to an active master controller that then processes the result etc.<br>
<br>
Profiling shows that (under DNX at least) 2/3rds of our time is spent in the reaper processing results, so wouldn't it make more sense to flip the process around?<br>
<br>
The checks are already executing on the local machine, so how about a daemon on each machine, the daemon would keep the schedule and execute service checks locally, processing the result and returning the results and the required actions (based on a local policy) to nagios which would then do the actual work of handling notifications etc and so forth.<br>
This way nagios could be an auditor, if it doesn't receive a result on time as expected, then it could query the daemon to see whats gone wrong, if that fails then it could initiate a host check, etc.<br>
<br>
>From a design standpoint this is a bit more work than the current setup, but it seems to me that this could allow for much greater flexibility and scalability in the long run.<br>
<br>
Anyways I hope this sparks a little debate but I don't want to "come in and shake things up", or go around changing everything, stepping on toes all the while, it's just that putting the responsibility of actually executing the check and doing so on time, onto the computer it needs to execute on, just makes more sense to me.<br>
It's not really dramatically different from what we do now, it's just adding a scheduler/timer to the existing execution framework and adding something to push the original schedule and any changes such as scheduled downtime to the appropriate machines, putting everything else into a semi passive mode effectively turning each machine to be checked into it's own "worker node"<br>
<br>
Thoughts?<br>
<br>
Sincerely,<br>
Steve<br>
<br>
<br>
<br>
<br>
NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.<br>
<br>
<br>
<br>
------------------------------------------------------------------------------<br>
Come build with us! The BlackBerry® Developer Conference in SF, CA<br>
is the only developer event you need to attend this year. Jumpstart your<br>
developing skills, take BlackBerry mobile applications to market and stay<br>
ahead of the curve. Join us from November 9-12, 2009. Register now!<br>
<a href="http://p.sf.net/sfu/devconf" target="_blank">http://p.sf.net/sfu/devconf</a><br>
_______________________________________________<br>
Nagios-devel mailing list<br>
</div></div><a href="mailto:Nagios-devel@lists.sourceforge.net">Nagios-devel@lists.sourceforge.net</a><mailto:<a href="mailto:Nagios-devel@lists.sourceforge.net">Nagios-devel@lists.sourceforge.net</a>><br>
<div><div></div><div class="h5"><a href="https://lists.sourceforge.net/lists/listinfo/nagios-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-devel</a><br>
<br>
<br>
------------------------------------------------------------------------------<br>
Come build with us! The BlackBerry® Developer Conference in SF, CA<br>
is the only developer event you need to attend this year. Jumpstart your<br>
developing skills, take BlackBerry mobile applications to market and stay<br>
ahead of the curve. Join us from November 9-12, 2009. Register now!<br>
<a href="http://p.sf.net/sfu/devconf" target="_blank">http://p.sf.net/sfu/devconf</a><br>
_______________________________________________<br>
Nagios-devel mailing list<br>
<a href="mailto:Nagios-devel@lists.sourceforge.net">Nagios-devel@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/nagios-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-devel</a><br>
</div></div></blockquote></div><br>