First rev of plugin to batch up check_by_ssh calls
Steven Grimm
koreth-nagios at midwinter.com
Fri Jan 17 05:57:58 CET 2003
I started setting up Nagios this week and quickly found that with my
site's mix of servers, checking the statuses of remote services was
getting to be a real headache. I didn't want to scatter duplicate
per-host NRPE configuration files on all our clustered application and
HTTP servers, didn't want to suffer the overhead of a separate ssh
connection for each service on each host using check_by_ssh. I saw
discussion on this list of using check_by_ssh to run multiple checks
in one go and report them back to Nagios as passive results, but that
would have meant constructing a separate check_by_ssh command object
for each unique combination of services on our various hosts.
So I wrote the following Perl script, which I call "batch_by_ssh",
which acts as a frontend to check_by_ssh and automates the construction
of passive-results-fetching command lines.
At a high level, the approach I took was to add a new "batch" service
mode that's halfway between active and passive. You define these batch
services in the Nagios configuration as if they were going to run on
the monitoring host (with a slightly different syntax for specifying
the check command in the service object) but you set active_checks_enabled
to 0 and check_freshness to 1. You can specify any number of batch
services for a given host.
Then you add one active service for the host, which runs batch_by_ssh.
batch_by_ssh scans the Nagios configuration to find all the batch
services for the host in question and runs check_by_ssh with the
appropriate command line to execute them all on the remote host.
Then it reports the results back to Nagios.
I also have an auxiliary script that generates servicedependency objects
based on the same configuration, so each host's batch services can be
marked as dependent on its batch_by_ssh service.
Hopefully I haven't duplicated someone else's work here, but I didn't
see anything like this after searching around the net and I think it
makes setting up remote monitoring a *LOT* easier.
batch_by_ssh can be found at
http://www.midwinter.com/~koreth/nagios/batch_by_ssh
And the dependency-generating script:
http://www.midwinter.com/~koreth/nagios/make_batch_dependencies
See the top of batch_by_ssh for documentation on the new config items.
Comments, bugfixes, etc. appreciated! Once a few people other than me have
had a chance to try this out, I'll submit it to the Nagios plugins project,
naturally.
Here's an example configuration to probe the disk space and user count on
a remote host. This is the example in the documentation at the top of
batch_by_ssh, which has more details about what it all means, but
hopefully it'll give you an idea of what I'm talking about. These all
go in the standard Nagios config files (hence the "#<>" in front of the
new keywords, so Nagios will ignore them.)
-Steve
P.S. Obviously you need to have passwordless ssh logins working before
you can use this -- if you can't successfully run a remote command
using the standard check_by_ssh plugin, this script won't be useful.
---
define host {
host_name myserver
address 1.2.3.4
#<> $USER1$ /usr2/nagios
}
define command {
command_name batch_by_ssh
command_line $USER1$/batch_by_ssh $HOSTNAME$
}
define command {
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
define command {
command_name check_local_users
command_line $USER1$/check_users -w $ARG1$ -c $ARG2$
}
define service {
use generic_service
service_description ssh
host_name myserver
active_checks_enabled 1
check_command batch_by_ssh
normal_check_interval 5
retry_check_interval 1
}
define service {
use generic_service
service_description User Count
host_name myserver,otherserver
active_checks_enabled 0
check_freshness 1
freshness_threshold 430 ; 7 minutes = check interval + 2 retries
check_command no_report ; see Nagios freshness checking docs
#<> batch_type ssh
#<> batch_command check_local_users!20!25
}
define service {
use generic_service
service_description /home disk space
host_name myserver,otherserver
hostgroup_name group1,group2
active_checks_enabled 0
check_freshness 1
freshness_threshold 430
check_command no_report
#<> batch_type ssh
#<> batch_command check_local_disk!10%!5%!/home
}
-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
More information about the Users
mailing list