[naemon-users] Some thoughts on dynamic service creation for SNMP traps
Patrick Ogenstad
patrick at ogenstad.com
Mon Jun 2 20:56:02 CEST 2014
To start off I don't know if this is the correct forum for this, but I'm
hoping the right people read this.
As I was watching Andreas' presentation The Future of Nagios (
http://www.youtube.com/watch?v=YgbbyyNIiHc) I started thinking about what
could be done when it comes to Nagios/Naemon and SNMP traps.
I like a lot of things with Nagios and related products. However when it
comes to SNMP traps I'm far from impressed. Most of the guides I've seen
use a trap-service template where check-host-alive is used to reset the
service after a trap has arrived. Each host which can send traps then get a
generic "trap service".
To just receive alerts this work to notify us when something has gone
wrong. However there are some shortcomings.
1. Currently (as far as I know) there isn't a good way to reset the service
back to a normal state. It can be done manually or you can wait until the
check-host-alive check is run.
2. If a trap has been received from a host and that same host sends another
trap the second trap will replace the first trap. The old information will
still be available through old notifications or logs. But now as viewable
from any status as in Livestatus.
What I would like to see is the ability to dynamically create services as
traps are received. Preferably you can create a few base service templates
connected to different servicegroups and contactgroups and all that.
As an example I was looking at some traps from Huawei switches today.
Specifically I wanted to see traps from a switch stack if one of the switch
members die. For this Huawei has a trap called hwStackStackMemberLeave the
trap will contain a varbind pointing to the switch id
(hwMemberCurrentStackId for the member which died. Once the broken switch
is replaced a new trap will be sent hwStackStackMemberAdd. This second trap
will also contain a varbind with the id of the switch which is not
returning to the stack.)
What would be really great is if we can have a switch host and then when we
receive the first trap hwStackStackMemberLeave a service is dynamically
created based on a template. Suck as
service {
host_name $switch-sending-trap$
use snmp-environment-trap
name stack-member-error
description Stack Member Error
}
This would remain in warning or critical state until it is cleared manually
by an administrator or until we receive a hwStackStackMemberAdd trap.
Also since different members in a stack can fail we would need to handle
this as we create dynamic services. So our template might actually look
like this:
service {
host_name $switch-sending-trap$
use snmp-environment-trap
name stack-member-error-$valueof-hwMemberCurrentStackId$
description Stack Member Error $valueof-hwMemberCurrentStackId$
}
This could only be cleared if by a hwStackStackMemberAdd trap which
contained the same switch id in the varbind. When this this happens the
dynamic service would be removed (perhaps at the next scheduled run).
Another reason to set it up like this is that we can have additional traps
being sent to the same host and view the status of all of them in real
time. For example if a fan breaks in the switch, or the CPU usage gets to
high we can create additional dynamic services for those.
Would anything like this be possible with the dynamic object creation
planned for Naemon?
http://networklore.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/naemon-users/attachments/20140602/85d2ac70/attachment.html>
More information about the Naemon-users
mailing list