Nagios and cluster setup...few questions
Tarak Patel
Tarak.Patel at ec.gc.ca
Tue Oct 9 15:32:07 CEST 2007
Hi all,
Here is a quick background of my current setup for monitoring:
I have an in-house tool monitoring clusters. The tool simply uses ssh to
launch perl scripts on remote machines and grab all of the output to
stores it on a central location in a logfile. This output is parsed and
for any pre-defined tags (WARNING/CRITICAL/ERROR). If any of these tags
are noticed the message is logged using syslog. The scripts residing on
remote hosts is a collection of perl functions. Each one is executed one
after another. Some of these functions utilize a status file from
previous run to verify if state of items changed from last time. Some of
these functions can be given a special argument to set the current state
as default state for next iteration of checks.
Cluster are monitored from the head nodes since not all nodes are
accessible from central location. Head node checks contain a special
function that simply use DSH to launch checks on all nodes.
After looking at nagios and its check_cluster plugins I realized I would
really like to monitor each of the nodes individually since I want to be
able to disable a particular check on a particular node. Also I want to
be able to use status files for some of the checks. As of now I have yet
to find any plugin that utilizes a status file to monitor hosts. All
plugin simply use current output from commands to verify the status.
I will be using active checks on the clusters therefore I will configure
nrpe on all nodes. My plan of attack was to simply use head node as a
gateway and all nodes and services to be defined on the head node
(under nrpe). From central location I can simply execute a check_nrpe
type script to verify backend nodes.
I still haven't figured out how I can use status files from each
iteration of checks to validate status. I'd appreciate some inputs as to
what are the best options in monitoring clusters where backend nodes are
hidden from the central monitoring server. Also some help with use of
state files.
Thanks all,
TP.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list