Way to replicate external commands to failover server?

Mark Wagner markwag at u.washington.edu
Tue May 13 21:23:49 CEST 2008


On Mon, May 12, 2008 at 12:11:44PM -0700, nagios-users-request at lists.sourceforge.net wrote:

> What has me a little concerned is that if someone went into the web 
> interface on the main server and say scheduled downtime or disabled 
> notifications, the backup server would never know about it.  In the even 
> to failure people could find themselves getting alerts for a host that 
> should have been in scheduled downtime (or it was on the main server).

> While I realize I would not want to capture and retransmit *all* 
> external commands to the backup host, if I could somehow get at them I 
> could filter them over to the backup host (i.e. "ignore most commands, 
> but pass a few like downtime or host notifications", etc).

> Is there any mechanism that allows me to do this?  As I understand it 
> the global host and service events really only capture check results -- 
> they're not going to fire if someone schedules downtime.

I have the same dilemma. I don't think Nagios was designed for multiple
web interface servers and anything you do to try to fix it may be too
hackish (as you're about to see). I wonder how the Nagios-based commercial
apps handle this.

I have written a set of scripts that will scour the Nagios log files and
relay selected commands to the backup. The set up is complicated and
I'm starting to think it is the wrong way to go but I'll present it for
you amusement.

On the main server there is a cron job that runs every two minutes:

*/2 * * * * nagios /usr/lib64/nagios/plugins/relay_to_secondary

The script basically does this:

[ -f "/var/log/nagios/rw/stop_relay" ] && exit 0
/usr/lib64/nagios/plugins/relay_commands \
	--start=/var/log/nagios/rw/last_relayed \
	--update-start <backup>

The file /var/log/nagios/rw/last_relayed keeps track of the location
in the log file(s) of the last line checked for relaying. The
"--update-start" option updates /var/log/nagios/rw/last_relayed with
the last line checked this run.

The "relay_commands" script is attached at the end of this message. It
parses the log files for (a configurable set of) commands to relay to
the backup and then ssh's (with passwordless keys) to the backup and
cat's these commands to the nagios external command pipe.

Under normal operation everything is OK. However, if the main or backup
servers go down there is additional work besides enabling/disabling
notifications that needs to be done in the event handler.

When the backup goes down you don't want to relay commands so the event
handler on the main will create the /var/log/nagios/rw/stop_relay file.

When the backup comes back you want to start relaying commands so the event
handler on the main will delete /var/log/nagios/rw/stop_relay.

When the main goes down the event handler on the backup
gets the last line in the log file and writes it to a file <foo>.

When the main comes back the event handler on the backup relays
the commands back to the main using the file <foo> as the starting
point.

But wait, there's more! I would like to relay acks/comments and their
deletion as well. However, the "delete comment" command takes an ID
number. If your comments are not exactly synchronized then the wrong
one will be deleted.

-- 
Mark Wagner <markwag at u.washington.edu>
System Administrator, UW Medicine IT Services
206-616-6119
-------------- next part --------------
#!/usr/bin/perl -w

# Relay commands to another nagios host

use strict;

use POSIX qw(strftime);
use Getopt::Long;

my $DEBUG = 0;

my $COMMAND_FILE = '/var/log/nagios/rw/nagios.cmd';

my @NAGIOS_LOG_FILES = (
	strftime('/var/log/nagios/archives/nagios-%m-%d-%Y-00.log', localtime),
	'/var/log/nagios/nagios.log',
);

my %CMD_TYPES_TO_RELAY = (
	'ACKNOWLEDGE_HOST_PROBLEM' => 1,
	'ACKNOWLEDGE_SVC_PROBLEM' => 1,
	'ADD_HOST_COMMENT' => 1,
	'ADD_SVC_COMMENT' => 1,
	'DELAY_HOST_NOTIFICATION' => 1,
	'DELAY_SVC_NOTIFICATION' => 1,
	'DEL_ALL_HOST_COMMENTS' => 1,
	'DEL_ALL_SVC_COMMENTS' => 1,
	'DEL_HOST_COMMENT' => 1,
	'DEL_HOST_DOWNTIME' => 1,
	'DEL_SVC_COMMENT' => 1,
	'DEL_SVC_DOWNTIME' => 1,
	'DISABLE_ALL_NOTIFICATIONS_BEYOND_HOST' => 1,
	'DISABLE_CONTACTGROUP_HOST_NOTIFICATIONS' => 1,
	'DISABLE_CONTACTGROUP_SVC_NOTIFICATIONS' => 1,
	'DISABLE_CONTACT_HOST_NOTIFICATIONS' => 1,
	'DISABLE_CONTACT_SVC_NOTIFICATIONS' => 1,
	'DISABLE_HOSTGROUP_HOST_CHECKS' => 1,
	'DISABLE_HOSTGROUP_HOST_NOTIFICATIONS' => 1,
	'DISABLE_HOSTGROUP_PASSIVE_HOST_CHECKS' => 1,
	'DISABLE_HOSTGROUP_PASSIVE_SVC_CHECKS' => 1,
	'DISABLE_HOSTGROUP_SVC_CHECKS' => 1,
	'DISABLE_HOSTGROUP_SVC_NOTIFICATIONS' => 1,
	'DISABLE_HOST_AND_CHILD_NOTIFICATIONS' => 1,
	'DISABLE_HOST_CHECK' => 1,
	'DISABLE_HOST_NOTIFICATIONS' => 1,
	'DISABLE_HOST_SVC_CHECKS' => 1,
	'DISABLE_HOST_SVC_NOTIFICATIONS' => 1,
	'DISABLE_NOTIFICATIONS' => 1,
	'DISABLE_PASSIVE_HOST_CHECKS' => 1,
	'DISABLE_PASSIVE_SVC_CHECKS' => 1,
	'DISABLE_SERVICEGROUP_HOST_CHECKS' => 1,
	'DISABLE_SERVICEGROUP_HOST_NOTIFICATIONS' => 1,
	'DISABLE_SERVICEGROUP_PASSIVE_HOST_CHECKS' => 1,
	'DISABLE_SERVICEGROUP_PASSIVE_SVC_CHECKS' => 1,
	'DISABLE_SERVICEGROUP_SVC_CHECKS' => 1,
	'DISABLE_SERVICEGROUP_SVC_NOTIFICATIONS' => 1,
	'DISABLE_SVC_CHECK' => 1,
	'DISABLE_SVC_NOTIFICATIONS' => 1,
	'ENABLE_ALL_NOTIFICATIONS_BEYOND_HOST' => 1,
	'ENABLE_CONTACTGROUP_HOST_NOTIFICATIONS' => 1,
	'ENABLE_CONTACTGROUP_SVC_NOTIFICATIONS' => 1,
	'ENABLE_CONTACT_HOST_NOTIFICATIONS' => 1,
	'ENABLE_CONTACT_SVC_NOTIFICATIONS' => 1,
	'ENABLE_HOSTGROUP_HOST_CHECKS' => 1,
	'ENABLE_HOSTGROUP_HOST_NOTIFICATIONS' => 1,
	'ENABLE_HOSTGROUP_PASSIVE_HOST_CHECKS' => 1,
	'ENABLE_HOSTGROUP_PASSIVE_SVC_CHECKS' => 1,
	'ENABLE_HOSTGROUP_SVC_CHECKS' => 1,
	'ENABLE_HOSTGROUP_SVC_NOTIFICATIONS' => 1,
	'ENABLE_HOST_AND_CHILD_NOTIFICATIONS' => 1,
	'ENABLE_HOST_CHECK' => 1,
	'ENABLE_HOST_NOTIFICATIONS' => 1,
	'ENABLE_HOST_SVC_CHECKS' => 1,
	'ENABLE_HOST_SVC_NOTIFICATIONS' => 1,
	'ENABLE_NOTIFICATIONS' => 1,
	'ENABLE_PASSIVE_HOST_CHECKS' => 1,
	'ENABLE_PASSIVE_SVC_CHECKS' => 1,
	'ENABLE_SERVICEGROUP_HOST_CHECKS' => 1,
	'ENABLE_SERVICEGROUP_HOST_NOTIFICATIONS' => 1,
	'ENABLE_SERVICEGROUP_PASSIVE_HOST_CHECKS' => 1,
	'ENABLE_SERVICEGROUP_PASSIVE_SVC_CHECKS' => 1,
	'ENABLE_SERVICEGROUP_SVC_CHECKS' => 1,
	'ENABLE_SERVICEGROUP_SVC_NOTIFICATIONS' => 1,
	'ENABLE_SVC_CHECK' => 1,
	'ENABLE_SVC_NOTIFICATIONS' => 1,
	'REMOVE_HOST_ACKNOWLEDGEMENT' => 1,
	'REMOVE_SVC_ACKNOWLEDGEMENT' => 1,
	'SCHEDULE_AND_PROPAGATE_HOST_DOWNTIME' => 1,
	'SCHEDULE_AND_PROPAGATE_TRIGGERED_HOST_DOWNTIME' => 1,
	'SCHEDULE_FORCED_HOST_CHECK' => 1,
	'SCHEDULE_FORCED_HOST_SVC_CHECKS' => 1,
	'SCHEDULE_FORCED_SVC_CHECK' => 1,
	'SCHEDULE_HOSTGROUP_HOST_DOWNTIME' => 1,
	'SCHEDULE_HOSTGROUP_SVC_DOWNTIME' => 1,
	'SCHEDULE_HOST_CHECK' => 1,
	'SCHEDULE_HOST_DOWNTIME' => 1,
	'SCHEDULE_HOST_SVC_CHECKS' => 1,
	'SCHEDULE_HOST_SVC_DOWNTIME' => 1,
	'SCHEDULE_SERVICEGROUP_HOST_DOWNTIME' => 1,
	'SCHEDULE_SERVICEGROUP_SVC_DOWNTIME' => 1,
	'SCHEDULE_SVC_CHECK' => 1,
	'SCHEDULE_SVC_DOWNTIME' => 1,
	'SEND_CUSTOM_HOST_NOTIFICATION' => 1,
	'SEND_CUSTOM_SVC_NOTIFICATION' => 1,
	'SET_HOST_NOTIFICATION_NUMBER' => 1,
	'SET_SVC_NOTIFICATION_NUMBER' => 1,
	'START_EXECUTING_HOST_CHECKS' => 1,
	'START_EXECUTING_SVC_CHECKS' => 1,
	'STOP_ACCEPTING_PASSIVE_HOST_CHECKS' => 1,
	'STOP_ACCEPTING_PASSIVE_SVC_CHECKS' => 1,
	'STOP_EXECUTING_HOST_CHECKS' => 1,
	'STOP_EXECUTING_SVC_CHECKS' => 1,
);

sub usage;
sub main;
sub get_range;
sub get_cmds;
sub send_cmds;
sub update_start_file;

###
### Main
###
exit main();

sub usage
{
	print <<USAGE;

Send nagios commands to another host.

Usage: $0 --start=<file> [ --end=<file> ] [ --update-start ] <host>
	--start=<file> = start getting commands from line listed in <file>
	--end=<file> = stop getting commands from line listed in <file>
	               (otherwise go to end of log)
	--update-start = update the start file the last line read
	<host> = the nagios host to which commands are sent
	         (set up ssh keys first)
USAGE
}


sub main
{
	my $start_file;
	my $end_file;
	my $update_start_file = 0;

	if (!GetOptions(
		'start=s' => \$start_file,
		'end=s' => \$end_file,
		'update-start' => \$update_start_file,
	)) {
		usage();
		return 1;
	}

	if (!defined $start_file) {
		print "The option --start=<file> is required\n";
		usage();
		return 1;
	}

	if (@ARGV != 1) {
		print "Exactly one non-option argument required\n";
		usage();
		return 1;
	}

	my $nagios_host = $ARGV[0];

	my $range = get_range($start_file, $end_file);
	defined $range or return 1;

	my ($cmds, $last_line) = get_cmds($range);
	defined $cmds or return 1;

	if (@{$cmds}) {
		send_cmds($cmds, $nagios_host) or return 1;
	}

	if ($update_start_file) {
		update_start_file($start_file, $last_line) or return 1;
	}

	0;
}

##
## Get range lines
##
sub get_range
{
	my @range_files = @_;

	my @range_lines;

	for (my $i = 0; $i < 2; $i++) {

		defined $range_files[$i] or next;

		if (-f $range_files[$i]) {

			if (!open RANGE_FILE, "< $range_files[$i]") {
				warn "## !open $range_files[$i]: $!\n";
				return;
			}

			$range_lines[$i] = <RANGE_FILE>;
			chomp $range_lines[$i];

			close RANGE_FILE;

			$DEBUG and print "## $range_files[$i] = '$range_lines[$i]'\n";
		}
	}

	\@range_lines;
}

##
## Load lines from nagios log files.
##
sub get_cmds
{
	my $range_lines = shift;

	my $line;
	my @cmds;
	my $cmd;
	my $cmd_type;
	my $cmd_time;
	my $last_line;

	for my $nagios_log_file (@NAGIOS_LOG_FILES) {

		$DEBUG and print "## processing $nagios_log_file\n";

		if (!open LOG, "< $nagios_log_file") {
			warn "## !open $nagios_log_file: $!\n";
			return (undef, undef);
		}

		while (defined ($line = <LOG>)) {

			chomp $line;

			$last_line = $line;

			if (defined $range_lines->[0] && $line eq $range_lines->[0]) {
				$DEBUG and print "## found start line: $. $line\n";
				@cmds = ();
			}

			if (defined $range_lines->[1] && $line eq $range_lines->[1]) {
				$DEBUG and print "## found end line: $. $line\n";
				last;
			}

			$line =~ /^(\[[0-9]+\]) EXTERNAL COMMAND: (([^;]+);.*)/ or next;

			($cmd_time, $cmd, $cmd_type) = ($1, $2, $3);

			defined $CMD_TYPES_TO_RELAY{$cmd_type} or next;

			$DEBUG and print "## cmd $cmd\n";

			push @cmds, "$cmd_time $cmd";
		}

		close LOG;
	}

	if ($DEBUG) {
		for $cmd (@cmds) {
			print "## send cmd '$cmd'\n";
		}
		print "## last line '$last_line'\n";
	}

	(\@cmds, $last_line);
}

##
## Send to remote nagios host
##
sub send_cmds
{
	my ($cmds, $nagios_host) = @_;

	if (!open REMOTE_NAGIOS_HOST, "| ssh -o BatchMode=yes -x $nagios_host 'cat > $COMMAND_FILE'") {
		warn "## !ssh $nagios_host: $!\n";
		return;
	}

	for my $cmd (@{$cmds}) {
		print REMOTE_NAGIOS_HOST "$cmd\n";
	}

	close REMOTE_NAGIOS_HOST;

	1;
}

##
## Update start file
##
sub update_start_file
{
	my ($start_file, $last_line) = @_;

	$DEBUG and print "## updating $start_file with '$last_line'\n";

	if (!open START_FILE, "> $start_file") {
		warn "## !open $start_file: $!\n";
		return;
	}

	print START_FILE "$last_line\n";
	
	close START_FILE;

	return;
}
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list