Improving the host <parents> logic
Shane Stixrud
shane at geeklords.org
Wed Dec 14 22:35:42 CET 2005
Nagios's host parent logic is good but it could be a whole lot better for
todays switched networks. There has been a couple of recommendations in
the past on how to improve this.
1) Allow nagios admins to change parent logic failure detection in cases
where one parent is up but others are down. By default nagios treats
multiple parents as redundant paths and thus does not suppress
notification in situations where at least one parent is OK.
The main disadvantage to this proposal is nagios rightly treats
parents as directly connected HOPs on the path back to nagios. This work
around would treat switches and routers as peers when they are not,
removing the possibility of redundancy detection and easily determining
which device is at fault.
2) Allow the nagios admins to assign a weighted priority to each host and
have a system that allows the admin to tune these values to suppress
notification where appropriate.
This type of solution in IMO is way more complex than is required, the
best part of the current solution is its simple to management
and obvious to deploy.
The main problem with the existing solution is modern switched networks
often have A LOT of managed nodes connected to one or more layer2
switches in the same layer3 network. Ideally nagios would allow admins to
suppress notification for both devices behind both layer2 devices and
layer3 interfaces. With that in mind I believe there is a relatively easy
solution that stays true to nagios's current parent model while still
meeting this challenge.
The existing parent logic should be able to remain pretty much as is,
merely renaming the directive to "l3parents" to distinguish this
should only be used for layer 3 parents.
Duplicating the existing parents logic and assigning it a new name
called l2parents. Nagios would then need to be modified to first check
l2parents before proceeding to the l3parents when a device goes into
a NON-OK state. If all l2 parents or l3 parents are down nagios would
follow the l2 or l3 inherited parents just as it does today.
IMO this change would be the least intrusive, adds layer2 parent support
and allows for redundancy detection for both layer2 and layer3 devices
with little added complexity.
Side note: The 3d map should show the layer2 parents as being
directly connected to the child device. The l3parents should only
connected to devices where their layer2 and layer3 parents are the same
NAME/IP. In this way you would see a server connected to a switch that is
in turn connected to another switch which then connects to the layer3
device, which so happens is how the physical connectivity IS setup in
reality.
Cheers,
Shane
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
More information about the Developers
mailing list