Failover configuration - "no output"
Lehman, John
LEHMANJ at us.panasonic.com
Thu Oct 5 22:15:31 CEST 2006
Let me start off saying that I am sorry this is so long but I wanted to
include all of the necessary information:
OK. I am going absolutley crazy at this point. I am getting "no output"
as a result of the following configuration for Nagios "failover" as
defined in the manual.
The following is a file I have created called failover.cfg
#################### Host Group Listing ################
define hostgroup{
hostgroup_name Nagios_Master
alias Nagios-Master
contact_groups Emails_to_GNPC_Staff
members Nagios-Master
}
################### Host Denfinition Listing #################
# Generic host definition template
define host{
name generic-host10 ; The name of this host template - referenced in
other host definitions, used for template recursion/resolution
check_command check-nagios
max_check_attempts 10
notification_interval 10
notification_period 24x7
notification_options d,r
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program
restarts
retain_nonstatus_information 1 ; Retain non-status information across
program restarts
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A
TEMPLATE!
}
define host{
use generic-host10 ; Name of host template to use
host_name Nagios-Master
alias Nagios Master
address 10.130.4.80
}
################### Service Denfinition Listing #################
# Generic service definition template
define service{
name generic-service10 ; The 'name' of this service template, referenced
in other service definitions
service_description NAGIOS
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
contact_groups Emails_to_GNPC_Staff
notification_interval 15
notification_period 24x7
notification_options c,r
check_command check_nagios
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized
(disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if
necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program
restarts
retain_nonstatus_information 1 ; Retain non-status information across
program restarts
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE,
JUST A TEMPLATE!
}
define service{
use generic-service10
host_name Nagios-Master
service_description HOST
check_command handle-master-host-event
normal_check_interval 5
notification_interval 5
}
define service{
use generic-service10
host_name Nagios-Master
service_description PROCESS
check_command handle-master-proc-event
normal_check_interval 5
notification_interval 5
}
HERE is the command definitions defined in checkcommands.cfg
define command{
command_name handle-master-host-event
command_line
/usr/lib/nagios/plugins/eventhandlers/redundancy-scenario1/handle-master
-host-event $HOSTSTATE$ $STATETYPE$
}
define command{
command_name handle-master-proc-event
command_line
/usr/lib/nagios/plugins/eventhandlers/redundancy-scenario1/handle-master
-proc-event $SERVICESTATE$ $STATETYP
E$
}
In the nagios.log I have the following:
[1159979791] HOST NOTIFICATION:
emailuser;Nagios-Master;DOWN;host-notify-by-email;check_nagios: Unknown
argument - (null)
Here is what I get from the GUI for the service identified above:
Nagios-Master
HOST
OK 10-04-2006 12:39:54 1d 22h 5m 17s 1/3 (No output!)
PROCESS
CRITICAL 10-04-2006 12:41:26 1d 22h 8m 41s 1/3 (No output!)
hope that this helps someone point me in the right direction.
This the dir and file contents for handle-master-host-event
/usr/lib/nagios/plugins/eventhandlers/redundancy-scenario1/handle-master
-host-event
#!/bin/sh
# REDUNDANCY EVENT HANDLER SCRIPT
# Written By: Ethan Galstad (nagios at nagios.org)
# Last Modified: 02-18-2002
#
# This is an example script for implementing redundancy.
# Read the HTML documentation on redundant monitoring for more
# information on what this does.
# Location of the echo and mail commands
echocmd="/bin/echo"
mailcmd="/bin/mail"
# Location of the event handlers
eventhandlerdir="/usr/lib/nagios/plugins/eventhandlers"
# Only take action on hard host states...
case "$2" in
HARD)
case "$1" in
DOWN)
# The master host has gone down!
# We should now become the master host and take
# over the responsibilities of monitoring the
# network, so enable notifications...
`$eventhandlerdir/enable_notifications`
# Notify someone of what has happened with the original
# master server and our taking over the monitoring
# responsibilities. No one was notified of the master
# host going down, since the notification would have
# occurred while we were in standby mode, so this is a good idea...
#`$echocmd "Master Nagios host is down!" | /bin/mail -s "Master Nagios
Host Is Down" admin at somedomain.com`
#`$echocmd "Slave Nagios host has entered ACTIVE mode and taken over
network monitoring responsibilities!" | $mailcmd -s "Slave Nagios Host
Has Entered ACTIVE Mode" admin at somedomain.com`
;;
UP)
# The master host has recovered!
# We should go back to being the slave host and
# let the master host do the monitoring, so
# disable notifications...
`$eventhandlerdir/disable_notifications`
# Notify someone of what has happened. Users were
# already notified of the master host recovery because we
# were in active mode at the time the recovery happened.
# However, we should let someone know that we're switching
# back to standby mode...
#`$echocmd "The master Nagios host has recovered, so the slave Nagios
host has returned to standby mode..." | $mailcmd -s "Slave Nagios Host
Has Returned To STANDBY Mode" admin at somedomain.com`
;;
esac
;;
esac
exit 0
I am fairly new so "any" advice is appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20061005/d9a720bc/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list