Adding more advanced correlation to nagios with sec (any interest?)

John P. Rouillard rouilj at cs.umb.edu
Sun Jun 29 16:07:29 CEST 2003


In message <20030629195722.A229 at IPAustralia.Gov.AU>,
Stanley Hopcroft writes:
>For a long time now I have wanted a means of handling snmp traps 
>
>. without having to write trap handlers - difficult to test and
>                                          difficult to ensure that 
>                                          the output of the handler
>                                          matches a Nag service.
>
>. to allow multiple trap services per host
>
>. to allow basic interpretation of the trap based on either the
>  value of the trap or the var-binds
>
>It seems to me that sec, reading the log file of snmptrapd will do this
>for me.

It probably can. Also sec allows matching patterns over multiple
lines, this makes it easier to digest the varbindings etc. Note that
you are still writing a trap handler, but it is written in sec rather
than another language. The regular expressions to match traps and all
the variable bindings may be lengthy. I use a simple shell script to
turn traps into nagios external commands, and it works for me.

Also note that you will need to wait for a later sec release than
2.1.7, or hack the sec code to allow embedding ;'s in the action
string, or you can call an external shell command that just pastes
together the components of a PROCESS_SERVICE_CHECK command with ;'s and
use the shellcmd action to call it.

>Another contender - to trying to hack it myself - was snort but snort is
>big, and quite simply, doesn't seem to allow processing outside of yet
>another handle as sec does.

Are you sure you mean snort (it's an IDS IIRC)? I use sec for a
superset of the tasks that I handled with logwatch, swatch and
logsurfer. Actually, I pushed for features from logsurfer to be added
to sec (event stores).

Now that said, there are rules on the sec site that correlate snort
output into a more managable form.

Also I use cascading sec's with the child secs started by spawn
actions to turn multiline events into single line events, or to create
new composite events that are correlated by the master sec.

>Unfortunately, I cannot comment about it's use for event correlation
>other than say it sounds a good thing because (in case you didn't
>mention it), there is the intriguing possibility of modelling complex
>services like business systems whose state is dependent on a number of
>processes.

Dynamic filtering based on the state of other operations is what I was
trying to demonstrate in some of my examples like suppressing bandwidth
alerts on an interface while a jumpstart (a system install booting and
loading software over the network) was running. I detect the jumpstart
by looking for a bootparamd request followed by a tftp of a particular
file. This sets a flag (context) that activates a
SingleWith2Thresholds rules that will destroy the flag once the rate
of alerts drops below 2 in 30 minutes.

So the rule automatically resets itself once the load on the interface
drops back into the normal range, but the start time depends on two
other events occurring within a certain time frame.

>This can be done with dependent services - maybe - but sec sounds like
>it could do this better.

I would say sec is more flexible in that timing relationships between
service alerts can also be accounted for make sec very powerful.

Also the ability to define flapping the way you want to (sadly there
is no way to indicate this to nagios. A "SET_FLAPPING" external
command would be useful) and suppressing the initial flaps is useful too.

I support a lot of development systems that have only one or two
people working on it. It is common for them to reboot the machines a
couple of times for patch installs, etc. I have a sec rule set up that
will not alert me when a specific range of interfaces sends a link
down trap, unless it exceeds the threshold of three link down/link up
cycles in 5 minutes. If that threshold is crossed, I expect that they
have screwed up the ability of the machine to complete a successful
boot, and I need to look at what is happening. Then it will reset
after the interface has not dropped in 30 minutes. Otherwise I
consider it to still be unstable, and it needs my attention.

Things like that are not currently possible with sec because of the
explicit time dependence on a passive check.

				-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01




More information about the Developers mailing list