Event Handlers Problem
Thomas Beecher
tbeecher at localnet.com
Thu May 12 23:05:22 CEST 2005
Well, to answer some questions that has been posed.
1. My testing was taking place on a 2.0b3 installation of Nagios. I was,
however, able to replicate the behavior on a 1.1 install, our production
instance, seperate box.
2. I tested the perl script as the nagios user to make sure it would actually
run as expected, and it did.
3. I have tried to call the script from inside shell wrapper, to no avail. The
shell script works fine if I call it directly (again, tested as myself and as
the nagios user). I tested it naked, with no params , with constants, and with
macro variables, all nada. In all cases, if I tee the output to a dummy file to
see what the output is, it's ONLY the arguments that come out, never anything
from the script.
So, In essance, I'm still stuck. I'm cheating right now, taking the only output
I can get (the macros), dumping that to a file, and running another script as a
cron job that parses the file holding the marco variables, then calling the perl
script. I know, it's a roundabout way of doing it, but it accomplishes the same
task. I don't plan to try that on our production copy of nagios, only because
there's already enough cron jobs running every 5 minutes, but thought I'd toss
it out there in case someone else wants to try that for themselves.
Thanks to everyone for all your help, if I ever get it working the correct way
I'll be sure to post again and let you know.
Thomas Beecher II
Network Administrator
LocalNet, Inc
tbeecher at localnet.com
Lewis Getschel wrote:
> I'll chip in my 3 cents on this topic...
>
> I don't see which version of Nagios you are running, My comments here
> are based on my Nagios 1.2 server
>
> I found 2 different issues that prevented my event_handlers from running:
> 1)
> When I tackled event_handlers I had written in tcsh script, I found a
> similar result to you, it LOOKED like it was being called (from event
> log), but even the simple "echo $1 >/tmp/nagios_debug.txt" didn't do
> anything. I finally tracked it down to (on MY system at least) Nagios
> wouldn't execute /bin/tcsh scripts! I rewrote to /bin/sh instead, and it
> started to work.
>
> 2)
> This one had me baffled for 3 months (!).
> My event handler command called for sending 6 variables (command_line
> $USER1$/event_handler_diskmail.original $SERVICESTATE$ $STATETYPE$
> $SERVICEATTEMPT$ $HOSTNAME$ $SERVICEDESC$ $OUTPUT$). I had similar
> result, it didn't run.
> I "solved" it when:
> I changed it to 1 parameter, the script ran.
> I added all 6, it stopped again.
> I changed it to just 2, it ran again
> I changed it to all 6, it stopped (see the pattern here <smirk>)
> I ended up building it up 1 parameter at a time until I got to 5. It ran.
> When I tried adding the output, it stopped again. I gave up at that
> point and left it at 5 (originally 6 months ago it did work with all 6,
> then it mysteriously stopped), my event handler doesn't handle 1
> particular case now (I can live with that for now).
>
> Try changing your parameters to constants (that was my first hint of
> trouble shooting) 1 2 3 and see if the script gets the constants at least.
>
> My 3 cents, FWIW
> Lewis
>
>
> Thomas Beecher wrote:
>
>> I am restarting the service after every change, so that's not an
>> issue. Learned that the hard way about 6 months ago...;-p
>>
>> When I change to this:
>>
>> command_line /bin/echo "$HOSTNAME$ $HOSTSTATE$" > /tmp/testing
>>
>> I do get the appropriate variables passed to the temp file. So it
>> would appear that the event handler is at least being called, that's
>> one less thing to look at!!
>>
>> This is where I'm at now. If I do this:
>>
>> /usr/local/nagios/libexec/restart_pm3.pl $HOSTNAME$ $HOSTSTATE$ >>
>> /tmp/testing
>>
>> from a command line, I get the proper output for the script. (normally
>> won't be any, but I inserted some for troubleshooting.) If I call this
>> from the event_handler, I only get the macro variables, nothing from
>> the script.
>>
>>
>> Thomas Beecher II
>> Network Administrator
>> LocalNet, Inc
>> tbeecher at localnet.com
>>
>> Marc Powell wrote:
>>
>>>
>>>> -----Original Message-----
>>>> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
>>>> admin at lists.sourceforge.net] On Behalf Of Thomas Beecher
>>>> Sent: Wednesday, May 04, 2005 10:57 AM
>>>> To: nagios-users at lists.sourceforge.net
>>>> Subject: Re: [Nagios-users] Event Handlers Problem
>>>>
>>>> Well, that was a serious brain fart on my part!!
>>>>
>>>> I moved the script to /usr/local/nagios/libexec/, and changed
>>>> checkcommands.cfg to show:
>>>>
>>>> define command{
>>>> command_name restart_pm3
>>>> command_line $USER1$/restart_pm3.pl $HOSTNAME$ $HOSTSTATE$
>>>> }
>>>>
>>>> $USER1$ is defined in resource.cfg as
>>>>
>>>> $USER1$=/usr/local/nagios/libexec
>>>>
>>>> Permissions on the file are:
>>>>
>>>> -rwxr-xr-x 1 nagios nagios 1701 2005-05-04 10:47 restart_pm3.pl
>>>>
>>>> I've changed the ownership to nagios/nagios to prevent any other
>>>> potential permission issues.
>>>
>>>
>>>
>>>
>>> All Good.
>>>
>>>
>>>> This returns the following:
>>>>
>>>> [1115221615] HOST ALERT: buftest;DOWN;SOFT;1;Telnet: CRITICAL - Socket
>>>> timeout after 1 seconds<br>SNMP: CRITICAL: snmpget returned errors: 1
>>>
>>>
>>>
>>> (
>>>
>>>> )<br>PING: CRITICAL - Host Unreachable (10.0.2.152)
>>>> [1115221615] HOST EVENT HANDLER: buftest;DOWN;SOFT;1;restart_pm3
>>>> [1115221625] HOST ALERT: buftest;DOWN;SOFT;2;Telnet: CRITICAL - Socket
>>>> timeout after 1 seconds<br>SNMP: CRITICAL: snmpget returned errors: 1
>>>
>>>
>>>
>>> (
>>>
>>>> )<br>PING: CRITICAL - Host Unreachable (10.0.2.152)
>>>> [1115221625] HOST EVENT HANDLER: buftest;DOWN;SOFT;2;restart_pm3
>>>> [1115221634] HOST ALERT: buftest;DOWN;SOFT;3;Telnet: CRITICAL - Socket
>>>> timeout after 1 seconds<br>SNMP: CRITICAL: snmpget returned errors: 1
>>>
>>>
>>>
>>> (
>>>
>>>> )<br>PING: CRITICAL - Host Unreachable (10.0.2.152)
>>>> [1115221634] HOST EVENT HANDLER: buftest;DOWN;SOFT;3;restart_pm3
>>>>
>>>> It doesn't error out, and seems to call the script, but it still
>>>
>>>
>>>
>>> doesn't
>>>
>>>> run.
>>>
>>>
>>>
>>>
>>> Did you remember to restart Nagios after making the config changes?
>>>
>>>
>>>> I have not tested the script as the Nagios user, however the front of
>>>> the script is set to dump whatever args get passed to it out to a file
>>>> before doing anything else, so if it was choking somwhere in the
>>>
>>>
>>>
>>> script
>>>
>>>> it would still be logged that it ran.
>>>
>>>
>>>
>>>
>>> Testing as the user the script is running as should be done anyway.
>>> Often times it shows other, less obvious issues like required libraries
>>> not being readable or executable by that user. I'd also suggest
>>> simplifying things greatly by making your event handler --
>>>
>>> command_line /bin/echo "$HOSTNAME$ $HOSTSTATE$" > /tmp/testing
>>>
>>> Just to make sure that it's getting run (I'm 99% sure that'll work as I
>>> expect ;) ).
>>>
>>> --
>>> Marc
>>
>>
>
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7393&alloc_id=16281&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list