Distributed monitoring quirks
mark
mark at woodstream.net
Thu Mar 6 19:15:19 CET 2003
Hi all,
I'm setting up distributed monitoring and can't quite get the behavior I'm
after. I'm guessing I just haven't hit on the right configuration but
after a day of working on it I thought I'd ask the list. My environment
uses a central server the recieves passive checks from a distributed
server. The two servers are connected by a site-to-site VPN. This is
important because it means the central monitor can not see the remote
hosts being monitored. So host checks from the central monitor won't work
if they are a ping. Now on with the problem description...
I have the distributed monitoring server up and running fine. It is
working as expected and sending updates to the central server fine.
The problem I'm having is on the central server. If a remote service goes
to hard critical, the distributed monitor picks it up fine. The central
monitor recieves the event but the critical services shows up as
"disabled". Further, if the remote host goes to hard critical (i.e. down),
the distributed monitor also see's that fine. On the central server, I
never see anything about the host being down .. not even a hint.
Now, as for configurations. On the central monitor, a host looks like this
# Generic host definition template
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ;Host notifications are enabled
event_handler_enabled 1 ;Host event handler is enabled
flap_detection_enabled 1 ;Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information
retain_nonstatus_information 1 ; Retain non-status info
register 0 ; DONT REGISTER
}
# 'deathstar' host definition
define host{
use generic-host ; Name of host template
host_name deathstar
alias deathstar.company.com
address 10.1.1.1
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
Note there is NO check_command definition. Because the central monitor can
not see the remote hosts, I had to remove the check_command entry.
Otherwise, each time a service had a problem, the central monitor would
try to ping the remote host, fail, and mark the host as being down
incorrectly. I have a feeling the lack of a check_command is why I never
see remote hosts go down... even when the distributed monitor sees them
go down.
A service entry on the central monitor looks like this:
define service{
name passive-service ; Template name
active_checks_enabled 0 ; Disable Active checks
passive_checks_enabled 1 ; Enable Passive checks
parallelize_check 1 ; parallelize checks
obsess_over_service 1 ; obsess over this svc
check_freshness 1 ; check service fresh
freshness_threshold 900 ; Stale if over 15 min.
notifications_enabled 1 ; enable notification
event_handler_enabled 1 ; enable event handler
flap_detection_enabled 1 ; enable flap detection
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status info
retain_nonstatus_information 1 ; Retain non-status info
check_command no-passive-update ; if stale run this cmd
register 0 ; DONT REGISTER THIS
}
# Service definition
define service{
use passive-service ;template
host_name mx1,mx2,mximc1
service_description SMTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups unix-admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
}
Note passive checks are enabled and active checks are disabled. I'm
guessing a hard critical service shows up as "disabled" on the central
server because the service definition has active checks disabled. The
no-passive-update command simply echos a CRITICAL warning that the passive
check is stale (as defined by freshness_threshold).
So...does anyone have some ideas on how I can do distributed monitoring,
in my situation where the central monitor can not see the remote hosts,
have hard critical service events not show up as "disabled" and get
critical host events to show up at all on the central monitor?
Any and all input is greatly appreciated!
Thanks,
Mark
-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger
for complex code. Debugging C/C++ programs can leave you feeling lost and
disoriented. TotalView can help you find your way. Available on major UNIX
and Linux platforms. Try it free. www.etnus.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list