Passive service-checks
Johan Henes
johan at henes.no
Thu Sep 16 22:36:35 CEST 2004
Thanks a lot for Your help, Harper !
I am sorry for my late feedback, but You know - customers :-)
I have found the error thanks to Your enlightening explanation...
The problem was actually that SNMPTT did enter 'WARNING' as status to the
submit_check_result - script instead of "1" as documented... Nagios logged
the text 'WARNING' in the log, but the status was interpreted as "OK" by
nagios - hence not sending a notification... When changing it everything
works fine....(puh - several days of work over)
What I am a bit surprised about is that the service runs active checks even
though the service description tells it not to. Should it be like that ??
By the way - When disabling active checks the status for max check attempts
changed to 1/1... - beats me why.... (Maybe it was not updated when i "cut
and pasted" it into the mail....
Again - thanks a lot for Your help !!!
Johan
----- Original Message -----
From: "Harper Mann" <hmann at itgroundwork.com>
To: "'Johan Henes'" <johan at henes.no>; "'Jose Dragone'"
<jdragone at pictage.com.ar>; <nagios-users at lists.sourceforge.net>
Sent: Wednesday, September 15, 2004 9:09 AM
Subject: RE: [Nagios-users] Passive service-checks
> Hi Johan,
>
> Yes, I did mean you!
>
> The check is set active. Try setting it to passive by selecting the
service
> and clicking "Disable checks of this service". You can test the
> notification by clicking "Submit passive check result for this service".
>
> The check is set is_volatile on, with max_check_attempts set to 3. It
looks
> like Nagios was not restarted / reloaded after the config was changed
since
> the config shows max_check_attempts 1 but the GUI shows 1/3 which means
> max_check_attemps 3. You should restart it in case reload missed the
> change, however, I've not had reload fail to load a config change.
>
> What I think is happening is that the SNMP trap is delivered, but it's
only
> 1 of 3 so it's only a soft state change. In 5 minutes, the active check
> that's defined, check-host-alive, runs and clears the soft state so you
are
> never getting a hard non-ok state to trigger the Notification. If you
> restart Nagios, it should put the check in Passive and max_check_attempts
to
> 1 which should Notify on the first trap post.
>
> Can you post the check alert history with "View Alert History For This
> Service" for a time period where you were testing the Passive check? It
> should show what was happening.
>
> Regards,
>
> - Harper
>
> Harper Mann
> Groundwork Open Source Solutions
> 510-599-2075 (cell)
>
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net
> [mailto:nagios-users-admin at lists.sourceforge.net] On Behalf Of Johan Henes
> Sent: Tuesday, September 14, 2004 11:09 PM
> To: hmann at itgroundwork.com; 'Jose Dragone';
> nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Passive service-checks
>
>
> ----- Original Message -----
> From: "Harper Mann" <hmann at itgroundwork.com>
> To: "'Jose Dragone'" <jdragone at pictage.com.ar>;
> <nagios-users at lists.sourceforge.net>
> Sent: Wednesday, September 15, 2004 5:35 AM
> Subject: RE: [Nagios-users] Passive service-checks
>
>
> > Hi Jose,
> >
> > The Email message below looks like an Active check. Does the check have
a
> > big red "P" for Passive Check by it in "Service Detail"? Is the trap
> shown
> > in the Nagios event log? It likely is if the trap message is showing up
> in
> > the service detail. The Service Detail should not show a plugin timeout
> as
> > it's not supposed to be calling a plugin.
> >
> > Can you post the service definition with any template parents and the
> > trap_handler?
>
> I guess You wanted it from me, as it was my service-check in the previous
> mail :-)
>
> I see no big P - The Service Details says :
> ----
> Current Status: OK
> Status Information: PING OK - Packet loss = 0%, RTA = 1.20 ms
> Current Attempt: 1/3
> State Type: HARD
> Last Check Type: ACTIVE
> Last Check Time: 15-09-2004 07:52:07
> Status Data Age: 0d 0h 2m 43s
> Next Scheduled Active Check: 15-09-2004 07:57:07
> Latency: < 1 second
> Check Duration: 4 seconds
> Last State Change: 15-09-2004 07:42:13
> Current State Duration: 0d 0h 12m 37s
> Last Service Notification: N/A
> Current Notification Number: 0
> Is This Service Flapping? N/A
> Percent State Change: N/A
> In Scheduled Downtime? NO
> Last Update: 15-09-2004 07:54:48
>
>
> Service Checks: ENABLED
> Passive Checks: ENABLED
> Service Notifications: ENABLED
> Event Handler: ENABLED
> Flap Detection: ENABLED
>
> -----
> The trap is shown in the log :
> ---
> [1095197698] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;10.0.48.40;TRAP;WARNING;10.0.48.5
> login/logout with 0
> ---
> Here is the service definition :
> - Template :
> ---
> define service{
> name generic-service ; The 'name' of
this
> service template, referenced in other service definitio
> ns
> active_checks_enabled 1 ; Active service checks
are
> enabled
> passive_checks_enabled 1 ; Passive service checks
are
> enabled/accepted
> parallelize_check 1 ; Active service checks
> should be parallelized (disabling this can lead to major per
> formance problems)
> obsess_over_service 1 ; We should obsess over
this
> service (if necessary)
> check_freshness 0 ; Default is to NOT check
> service 'freshness'
> notifications_enabled 1 ; Service notifications
are
> enabled
> event_handler_enabled 1 ; Service event handler is
> enabled
> flap_detection_enabled 1 ; Flap detection is
enabled
> process_perf_data 1 ; Process performance data
> retain_status_information 1 ; Retain status
information
> across program restarts
> retain_nonstatus_information 1 ; Retain non-status
> information across program restarts
>
> register 0 ; DONT REGISTER THIS
> DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
> }
> ---
> The servicedefinition :
> ---
> define service{
> use generic-service
> host_name *
> service_description TRAP
> is_volatile 1
> check_period 24x7
> contact_groups linux-admins
> passive_checks_enabled 1
> active_checks_enabled 0
> max_check_attempts 1
> normal_check_interval 5
> retry_check_interval 1
> notification_interval 5
> notification_period 24x7
> notification_options w,u,c,r
> notifications_enabled 1
> check_command check-host-alive
> }
> ---
>
> My trap-handler :
>
> [root at bigb nagios]# cat /etc/snmp/snmptrapd.conf
> traphandle default /usr/sbin/snmptthandler
>
> ..
> ..
> snmptt.conf :
> EXEC /usr/lib/nagios/plugins/eventhandlers/submit_check_result $r TRAP
> 'WARNING' "$2 login/logout with $3"
> ...
> ...
> and last ...
> ...
>
> #!/bin/sh
>
> # SUBMIT_CHECK_RESULT
> # Written by Ethan Galstad (nagios at nagios.org)
> # Last Modified: 02-18-2002
> #
> # This script will write a command to the Nagios command
> # file to cause Nagios to process a passive service check
> # result. Note: This script is intended to be run on the
> # same host that is running Nagios. If you want to
> # submit passive check results from a remote machine, look
> # at using the nsca addon.
> #
> # Arguments:
> # $1 = host_name (Short name of host that the service is
> # associated with)
> # $2 = svc_description (Description of the service)
> # $3 = return_code (An integer that determines the state
> # of the service check, 0=OK, 1=WARNING, 2=CRITICAL,
> # 3=UNKNOWN).
> # $4 = plugin_output (A text string that should be used
> # as the plugin output for the service check)
> #
>
> echocmd="/bin/echo"
>
> CommandFile="/var/log/nagios/rw/nagios.cmd"
>
> # get the current date/time in seconds since UNIX epoch
> datetime=`date +%s`
>
> # create the command line to add to the command file
> cmdline="[$datetime] PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4"
>
> # append the command to the end of the command file
> `$echocmd $cmdline >> $CommandFile`
>
>
> ... Thats it :-)
>
> Johan
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
> Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
> Camcorder. More prizes in the weekly Lunch Hour Challenge.
> Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
> Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
> Camcorder. More prizes in the weekly Lunch Hour Challenge.
> Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list