<div><br></div>This maybe good road-work for next version. The list pretty much implements what DNX did (except of DNX being udp-based and not TCP).<div><br></div><div>I agree about 1 being good since indeed you may not want to listed on all interfaces for security reasons, its minor though, what is important is security for when workers first register, I'd recommend PKI-based for those who need it with alternative simple hash-password authentication who don't want this complexity. Similarly SSL should be optional rather than required.</div>
<div><br></div><div>Rather crude host section based on ip addresses is what DNX did, hostgroup-based selection is what mod_gearma did. Seems like both are actually good to have. But it appears to me hostgroup is somewhat more neat and easier to add unless you want to create a new host/key registration like you wrote. </div>
<div><br></div><div>In general when you implement something like this, it can just replace nagios server and you can just run one of the workers on nagios host itself too. So its easier to just have a switch if nagios should or should not run checks if any workers are registered.</div>
<div><br></div><div>Don't forget that one of the most important things is to have mechanism to get statistics on number of workers currently registered, how many jobs they are handling, etc. I'd add it as additional numbered item 5) on the list. <br>
<div><br><div class="gmail_quote">On Sat, Feb 2, 2013 at 6:12 AM, Eric Stanley <span dir="ltr"><<a href="mailto:estanley@nagios.com" target="_blank">estanley@nagios.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
All,<br>
<br>
I've been giving some thought to remote workers for core 4 and wanted to<br>
run those thoughts by this list. I see remote workers as a very useful<br>
extension to the worker concept in core 4.<br>
<br>
To implement remote workers, I think there are about 4 basic things that<br>
would need to be done.<br>
1. Implement the ability to listen to multiple query handler interfaces<br>
(precursor to #2)<br>
2. Implement the ability to create and listen on TCP socket query<br>
handler interfaces.<br>
3. Add a host key to the worker registration to allow workers to specify<br>
the host(s) for which it will handle checks.<br>
4. Write a stand-alone remote worker that can connect to the core<br>
instance via TCP.<br>
<br>
The reason I have steps 1 and 2, instead of combining them is first,<br>
because a generalized solution is more extensible and second, I think<br>
having multiple TCP listeners is a reasonable use case where you have a<br>
multi-homed system, but you may not want to listen on all interfaces.<br>
<br>
The host key should be allowed to specify one or more IP addresses, IP<br>
subnets, contiguous IP address ranges, host names and host name<br>
patterns/wildcards (i.e. *.<a href="http://example.com" target="_blank">example.com</a>). If multiple workers register<br>
for the same host, some sort of distribution mechanism should be used to<br>
load balance the workers.<br>
<br>
Using the second criteria of host to determine which worker gets the<br>
check raises the question of the order of precedence for the criteria.<br>
Initially, I think the host should have precedence over plugin, but I<br>
can see implementing and order of precedence option in the core<br>
configuration file. This would be more important if additional worker<br>
selection criteria were added.<br>
<br>
The communication between the remote worker and the core process should<br>
be able to be protected by SSL. The remote worker will need a mechanism<br>
to retry the connection in the event the network drops the connection.<br>
<br>
I realize this is a sizable change and we may not want it to happen<br>
before the release of 4.0. Thoughts on this are welcome.<br>
<br>
Further down the road, I can see developing a remote worker proxy, whose<br>
sole job is to broker the communication between core and even more<br>
remote workers. This would enable a tree-shaped worker hierarchy for<br>
monitoring environments that are both large and dispersed geographically<br>
and/or topologically. This would require a re-registration process so<br>
the proxy workers could keep core updated with their abilities as<br>
leaf-node workers connected and disconnected.<br>
<br>
Thoughts?<br>
<br>
--<br>
Eric Stanley<br>
___<br>
Developer<br>
Nagios Enterprises, LLC<br>
<a href="mailto:Email%3Aestanley@nagios.com">Email:estanley@nagios.com</a><br>
Web:<a href="http://www.nagios.com" target="_blank">www.nagios.com</a><br>
<br>
<br>
------------------------------------------------------------------------------<br>
Everyone hates slow websites. So do we.<br>
Make your web apps faster with AppDynamics<br>
Download AppDynamics Lite for free today:<br>
<a href="http://p.sf.net/sfu/appdyn_d2d_jan" target="_blank">http://p.sf.net/sfu/appdyn_d2d_jan</a><br>
_______________________________________________<br>
Nagios-devel mailing list<br>
<a href="mailto:Nagios-devel@lists.sourceforge.net">Nagios-devel@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/nagios-devel" target="_blank">https://lists.sourceforge.net/lists/listinfo/nagios-devel</a><br>
</blockquote></div><br></div></div>