Useful handler script potentially for users
Mueller, Karl
KMueller at netsuite.com
Thu Mar 25 02:50:37 CET 2004
Hi All,
I wrote a useful Perl script 2 years ago to execute Nagios handlers
which I forgot to share. I perused the Nagios source and various emails
around the lists, but didn't see anything with a similar function. If
there is something out there already, please let me know! I'm always
interested in going more "standard"
The script performs two functions:
1) Abstracts the Nagios "states" (service state, type, attempts) into a
command line configuration option. This removes the need (in general)
to write functionality into the other handlers to determine whether they
should run during "OK" state, or just "CRITICAL", hard vs. soft, etc. I
got tired of writing these into our many handler scripts.
2) Allow you to chain multiple scripts together with one handler. This
is very important to us at our company. We want to, for example, run 2
debugging scripts when a service goes into a CRITICAL SOFT state, but
then when it's CRITICAL HARD, restart something. Can you do this with a
custom handler script? Of course, but that's a lot of work.
(relatively speaking) Especially if you have many handlers and options,
like we do.
Let me give you two examples here using the script:
define command {
command_name db_snapshot_handler
command_line
$USER10$/handler.pl $SERVICESTATE$ $STATETYPE$ $SERVICEATTEMPT$
,C,S,4 $USER10$/db_snapshot.pl $HOSTNAME$
,C,S,4 $USER12$/logevent.pl source=Nagios severity=NOTIFICATION
type=DBCRISISLOAD targethost=$HOSTNAME$
,C,S,4 $USER10$/db_poke.pl -h $HOSTNAME$ -s DB_Crisis_Load ,C,H,+
$USER10$/db_snapshot.pl $HOSTNAME$ }
(** I've taken the liberty of breaking out the lines to make them more
clear. In the actual configuration, they are all in one line**)
The first part calls the "handler.pl" script itself and gives it the
current State, Type, and Attempt number. Past that, we have three
triggering events:
**** ,C,S,4 $USER10$/db_snapshot.pl $HOSTNAME$
The first ',' is the delimiter for a new command. This says, on a
CRITICAL, SOFT state on the 4th attempt, run USER10$/db_snapshot.pl
$HOSTNAME$
**** ,C,S,4 $USER12$/logevent.pl source=Nagios severity .....
On CRITICAL, SOFT on the 4th Attempt, run this command. In the case of
two or more commands that run, they run in serial. The script does not
currently check exit status or that kind of thing. (It was just a
simple thing, really)
**** ,C,S,4 $USER10$/db_poke.pl -h $HOSTNAME$ -s DB_Crisis .....
Again, same states as before, run a "db_poke.pl" script with options.
So on this state, it will run three things in succession. Of course,
all three have to complete before Nagios whacks them based on its own
timeouts.
Here is another example:
define command {
command_name oc4j_pokerestart
command_line $USER10$/handler.pl $SERVICESTATE$ $STATETYPE$
$SERVICEATTEMPT$ ,C,H,+ $USER10$/oc4j_restart.pl -s Unknown $HOSTNAME$
,CW,S,+ $USER10$/oc4j_poke.pl -s Unknown -h $HOSTNAME$ }
Let's examine the criteria:
**** ,C,H,+ $USER10$/oc4j_restart.pl -s Unknown $HOSTNAME$
This says on CRITICAL, HARD and ANY Attempt #, run the restart script.
Of course, there is only ONE 'hard' event handler called, so this is
safe.
**** ,CW,S,+ $USER10$/oc4j_poke.pl -s Unknown -h $HOSTNAME$
On CRITICAL or WARNING, SOFT, and ANY attempt number, run this script.
Obviously, it will run "N - 1" iterations of this, where N is your
number of checks that fail before entering a hard state. (In our
configuration, N = 3, so oc4j_poke.pl will execute twice)
The script is not meant to be complicated or have a lot of features.
There are some shortcomings:
1) Uses "system()" in perl - This may cause sub-shell spawns depending
on what you call.
2) Is written in Perl - If you have lots of service state changes, or
have performance issues, this may not be speedy enough for you. We've
had no performance issues, though, in our configuration.
3) Long command lines/complicated chains/etc. - These sorts of things
are probably still handled by custom scripts.
Is there a better or standard way to do this? I could not find any.
You can find a copy of this script here:
http://www.xney.com/handler.txt
-Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040324/06157613/attachment.html>
More information about the Users
mailing list