Distributed Monitoring
Burnson, Richard
rburnson at cps.k12.il.us
Tue Jan 7 17:16:24 CET 2003
I haven't seen any responses yet, perhaps more information is required?
Here is some of the debugging I have done:
Verified services have "obsess over service" enabled:
define service{
name check-service ; The 'name' of this
service template, referenced in other service definitions
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 15
retry_check_interval 2
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
active_checks_enabled 1 ; Active service checks are
enabled
passive_checks_enabled 1 ; Passive service checks are
enabled/accepted
parallelize_check 1 ; Active service checks
should be parallelized (disabling this can lead to major per
formance problems)
================================================
obsess_over_service 1 ; We should obsess over this
service (if necessary)
================================================
check_freshness 1 ; Default is to NOT check
service 'freshness'
notifications_enabled 1 ; Service notifications are
enabled
event_handler_enabled 1 ; Service event handler is
enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information
across program restarts
retain_nonstatus_information 1 ; Retain non-status
information across program restarts
register 0 ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
The "submit_check_result" command is defined as follows:
define command{
command_name submit_check_result
command_line
/usr/local/nagios/libexec/eventhandlers/submit_check_result $HOSTNAME$
'$SERVICEDESC$' $SERVICESTATE$ '$OUTPUT$'
}
The "ocsp_command" is configured in nagios.cfg as follows with obsess over
services enabled:
obsess_over_services=1
ocsp_command=submit_check_result
I have removed the original commands from the documentation, and implemented
the two scripts that came with the nagios tar ball in the
contrib./distributed monitoring directory. I am able to send a service
check manually, as Nagios, to the central server.
[nagios at ilnetmon03 nagios]$
/usr/local/nagios/libexec/eventhandlers/submit_check_result 3390-RTR PING-RI
1 test
1 data packet(s) sent to host successfully.
However the ocsp command never seems to be executed after any service
checks. Here is the output with nagios compiled with debug set to 3:
*** Event Details ***
Event type: 0 (service check)
Service Description: PING
Associated Host: 3390-KRO
Event time: Tue Jan 7 08:41:36 2003
Checking service 'PING' on host '3390-KRO'...
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:36 2003
Next High Priority Event Time: Tue Jan 7 08:41:39 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:37 2003
Next High Priority Event Time: Tue Jan 7 08:41:39 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:38 2003
Next High Priority Event Time: Tue Jan 7 08:41:39 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:39 2003
Next High Priority Event Time: Tue Jan 7 08:41:39 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Details ***
Event type: 10 (status save)
Event time: Tue Jan 7 08:41:39 2003
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:39 2003
Next High Priority Event Time: Tue Jan 7 08:41:44 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:40 2003
Next High Priority Event Time: Tue Jan 7 08:41:44 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:41 2003
Next High Priority Event Time: Tue Jan 7 08:41:44 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:42 2003
Next High Priority Event Time: Tue Jan 7 08:41:44 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:43 2003
Next High Priority Event Time: Tue Jan 7 08:41:44 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Check Loop ***
Current time: Tue Jan 7 08:41:44 2003
Next High Priority Event Time: Tue Jan 7 08:41:44 2003
Next Low Priority Event Time: Tue Jan 7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Details ***
Event type: 7 (service check reaper)
Event time: Tue Jan 7 08:41:44 2003
Starting to reap service check results...
Found check result for service 'PING' on host '3390-KRO'
Check Type: ACTIVE
Parallelized?: Yes
Exited OK?: Yes
Return Status: 0
Plugin Output: 'FPING OK - 10.1.1.1 (loss=0.000000%,
rta=27.800000 ms)'
Finished reaping service check results.
Any ideas on why it's not working?
TIA,
Richard
-----Original Message-----
From: Burnson, Richard
Sent: Friday, January 03, 2003 3:53 PM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring
I am trying to setup distributed monitoring with Nagios 1.0 (Stable) on
RedHat 7.2. I have the nsca daemon running on the central server, and I
have been able to successfully send a service check result via nsca-send
from the distributed server. The issue appears to be that the distributed
server is not executing the ocsp_command. Here are the settings on the
distributed server:
1. Obsess over services is enabled both globally and per service.
2. The ocsp_command is defined in nagios.cfg as
ocsp_command=submit_check_result
3. "submit_check_result" is defined in the command definitions section
exactly from the documentation.
4. The submit_check_result script was created in the libexec directory
and the command definition points directly to this file.
I can log in as nagios and run the submit_check_result shell script
successfully, and the service check is received by the central server. It
simply seems to be that Nagios is not executing the ocsp command with every
service check. I've tried to watch the service check as they happen via the
nagios.log file and even compiled nagios with debug 3. Is there a better
way to debug this? Anyone have any ideas on what I may be missing?
Thanks,
Richard
-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
More information about the Users
mailing list