Event Handlers are not runing or logging. (on WARNING or CRITICAL)
Cook, Garry
GWCOOK at mactec.com
Thu Sep 2 15:41:47 CEST 2004
nagios-users-admin at lists.sourceforge.net wrote:
> Hi,
>
> I think my email is not working correctly because im not getting
> responses to my questions until I post a follow up (very weird)
>
> Has anyone had any thoughts on my findings below?
>
> Just to refresh the issue,
> Originally I thought Event handlers were not running, however I have
> since found that the event handlers are running but only when
> a service
> check returns OK when it has been in another state. This is not very
> useful since an event handler should be fixing the occurring problems
> not trying to fix them after they are manually fixed. Ive included a
> log file of one host/service which experiences the problem (qouted
> below) so that people can see what I mean,
>
> Any thoughts would be appreciated,
IIRC, the event handler is run after each state change, whether it be
hard or soft. Whether or not the handler does anything at these various
stages is a function of the event handler itself. My guess is that this
was not apparent to you before, or you would have posted the event
handler script and requested help debugging that. Therefore, you should
probably drop back ten yards and punt. Go back and read the docs again
(http://nagios.sourceforge.net/docs/1_0/eventhandlers.html), and pay
special attention to the example 'restart-httpd' script.
Garry W. Cook, CCNA
Network Infrastructure Manager
MACTEC, Inc. - http://www.mactec.com/
303.308.6228 (Office) - 720.220.1862 (Mobile)
>
>> Bruce at WebFarm.co.nz +64 06 7572881 | (o o)
>> Systems Technician +---ooO-(_)-Ooo---+
>> |
>> WebFarm http://www.webfarm.co.nz |
>> FreeParking http://www.freeparking.co.nz |
> +------------------------------------------------------------+
>
> ... FreeParking - NZ's best value Domain, WebHosting and
> email accounts - bar none
> ... WebFarm - NZ's eCommerce specialists since 1997
>
>
>
>
> bruce wrote:
>
>> Hi,
>>
>> Ive done a little more testing and it appears the event handlers ARE
>> running but only when the state changes to OK, which of course is no
>> use for fixing the problem.
>>
>> Below is the nagios.log file from one of the live system (well
>> result of: egrep 'creeper.*Defun' var/nagios.log), freshclam seems
>> to be running on all the severs but the Defunct processes check does
>> get some results. The nagios configs are excatly the same for these
>> also (the command sends fixdefuncts.sh instead of
>> restartFreshClam.sh and thats the only difference.
>>
>> -- 8<-- nagios.log
>> [1093669850] SERVICE ALERT: creeper;Defuncts;OK;HARD;1;OK - 5
>> processes running with STATE = Z [1093670146] SERVICE ALERT:
> creeper;Defuncts;WARNING;HARD;1;WARNING - 6
>> processes running with STATE = Z
>> [1093673451] SERVICE ALERT:
> creeper;Defuncts;WARNING;HARD;1;WARNING - 7
>> processes running with STATE = Z
>> [1093677052] SERVICE ALERT:
> creeper;Defuncts;WARNING;HARD;1;WARNING - 8
>> processes running with STATE = Z
>> [1093680652] SERVICE ALERT:
> creeper;Defuncts;WARNING;HARD;1;WARNING - 10
>> processes running with STATE = Z
>> [1093684251] SERVICE ALERT:
> creeper;Defuncts;WARNING;HARD;1;WARNING - 10
>> processes running with STATE = Z
>> [1093685900] SERVICE ALERT:
> creeper;Defuncts;CRITICAL;HARD;1;CRITICAL -
>> 11 processes running with STATE = Z
>> [1093687852] SERVICE ALERT:
> creeper;Defuncts;CRITICAL;HARD;1;CRITICAL -
>> 11 processes running with STATE = Z
>> [1093691451] SERVICE ALERT:
> creeper;Defuncts;CRITICAL;HARD;1;CRITICAL -
>> 13 processes running with STATE = Z
>> [1093695059] SERVICE ALERT:
> creeper;Defuncts;CRITICAL;HARD;1;CRITICAL -
>> 15 processes running with STATE = Z
>> [1093696438] SERVICE ALERT: creeper;Defuncts;OK;HARD;1;OK - 0
>> processes running with STATE = Z [1093696438] SERVICE EVENT HANDLER:
>> creeper;Defuncts;OK;HARD;1;allserver_defunct_fix
>> [1093696516] SERVICE ALERT: creeper;Defuncts;OK;HARD;1;OK - 0
>> processes running with STATE = Z [1093696624] SERVICE ALERT:
>> creeper;Defuncts;OK;HARD;1;OK - 0 processes running with STATE = Z
>> [1093696673] SERVICE ALERT: creeper;Defuncts;OK;HARD;1;OK - 0
>> processes running with STATE = Z [1093697080] SERVICE ALERT:
>> creeper;Defuncts;OK;HARD;1;OK - 1 processes running with STATE = Z
>> -- 8<-- End nagios.log
>>
>> As you can see it goes through the motions, OK => WARNING =>
>> CRITICAL => OK (when we mannually restart the offending process on
>> the server, yeah the better fix would be to fix the process but we
>> are still investigating why it happens :( very weird, but different
>> issue )
>>
>> When changing from OK => WARNING it dosnt run the event handler,
>> only when it goes back to OK does it run.
>>
>> If I change the event handlers args to be a static CIRITCAL the
>> handler logs in and does the restart, so everything is fine there.
>>
>> Here are the related config sections just for reference of this
>> command and service:
>>
>> define service {
>> use hosted
>> service_description Defuncts
>> check_command serv_check_zombie_procs
>>
>> event_handler allserver_defunct_fix
>> event_handler_enabled 1
>> hostgroup_name shared
>> }
>> define command {
>> command_name allserver_defunct_fix
>> command_line
> $USER1$/fix-w-allserver.sh $HOSTADDRESS$ $SERVICESTATE$
> $SERVICEATTEMPT$ defunctFix.sh
>> }
>>
>>
>> Any thoughts or suggestions?
>>
>> Cheers,
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_idP47&alloc_id808&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list