Understanding check_cluster
Lee Azzarello
lee at dropio.com
Wed Feb 25 01:03:16 CET 2009
Here's my config. It's functional:
define command{
command_name check-cluster-health
command_line /usr/lib/nagios/plugins/check_cluster --service -l
$ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}
define service{
service_description check-cluster-health
host app-proxy
check_command check-cluster-health!"App Thread
Health"!0!1!$SERVICESTATEID:app-1:mongrel-count$,$SERVICESTATEID:app-2:mongrel-count$,$SERVICESTATEID:app-3:mongrel-count$,$SERVICESTATEID:app-4:mongrel-count$
use serviceClusterTemplate
}
define service{
service_description mongrel-count
hostgroup app-servers,manager-servers
check_command check_nrpe_1arg!check_mongrel_count
notifications_enabled 0
use serviceClusterTemplate
}
-lee
On Tue, Feb 24, 2009 at 5:18 PM, Chris Beattie <cbeattie at geninfo.com> wrote:
> I need some help understanding the check_cluster plugin, please. I’m using
> version 1.4.13 of the plugins on Nagios 3.10, all compiled from source on
> 64-bit CentOS 5.2. We use VMWare ESX clusters, and I’d like the hosts in
> Nagios that happen to be virtual machines to have one parent instead of a
> list of parents comprising every ESX host in the cluster. Recently, an ESX
> host was moved from one cluster to another, so I had to change a lot of
> parents. If there’s a better way to represent VMs and their hosts, I’m open
> to suggestions too.
>
>
>
> I don’t have any problem running it as the Nagios user from the command line
> and feeding it states, like so:
>
> ./check_cluster --host --data=0,0,2,1 --warning=0 --critical=1
>
> CLUSTER CRITICAL: Host cluster: 2 up, 1 down, 1 unreachable
>
> ./check_cluster --host --data=0,0,0,0 --warning=0 --critical=1
>
> CLUSTER OK: Host cluster: 4 up, 0 down, 0 unreachable
>
> ./check_cluster --host --data=0,0,0,1 --warning=0 --critical=1
>
> CLUSTER WARNING: Host cluster: 3 up, 1 down, 0 unreachable
>
>
>
> Adding --verbose just says “check_cluster - Warning: start=0 end=0;
> Critical: start=0 end=1” first.
>
>
>
> However, if I try anything with the $HOSTSTATEID$ macro, everything is
> always OK, even if I just make up host names:
>
> [./check_cluster --host
> --data=$HOSTSTATEID:duck$,$HOSTSTATEID:cow$,$HOSTSTATEID:chicken$
> --warning=0 --critical=1
>
> CLUSTER OK: Host cluster: 3 up, 0 down, 0 unreachable
>
>
>
> I thought maybe macros work better when executed by Nagios, so I added
> check_host_cluster command a host with that as its check_command.
>
> define command {
>
> command_name check_host_cluster
>
> command_line $USER1$/check_cluster --host --label=$HOSTNAME$
> --warning=$ARG1$ --critical=$ARG2$ --data=$ARG3$
>
> }
>
>
>
> define host {
>
> use linux-server
>
> host_name ProductionCluster1
>
> alias Production Cluster 1
>
> address 127.0.0.1
>
> parents gisesx1,gisesx3,gisesx4
>
> check_command
> check_host_cluster!1!2!$HOSTSTATEID:foo1$,$HOSTSTATEID:foo3$,$HOSTSTATEID:foo4$
>
> hostgroups nogsupport
>
> }
>
>
>
> The check_interval for the linux-server template is set to 3. I made the
> assumption that it didn’t matter what I set the address to since I’m only
> interested in the state of other hosts, and it’s not being referenced in the
> check_command.
>
>
>
> It shows up in the host information web page as being up, but I don’t have
> any hosts named foo:
>
> Host Status:
>
> UP
>
> (for 0d 3h 41m 9s+)
>
> Status Information: CLUSTER OK: ProductionCluster1: 3 up, 0 down, 0
> unreachable
>
>
>
> I had better luck with check_icmp, but it looks like it goes straight to
> CRITICAL if one host is down.
>
> This message (including any attachments) is intended only for
> the use of the individual or entity to which it is addressed and
> may contain information that is non-public, proprietary,
> privileged, confidential, and exempt from disclosure under
> applicable law or may constitute as attorney work product.
> If you are not the intended recipient, you are hereby notified
> that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, notify us immediately by telephone and
> (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication.
>
> Thank you.
>
>
> ------------------------------------------------------------------------------
> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
> -Strategies to boost innovation and cut costs with open source participation
> -Receive a $600 discount off the registration fee with the source code: SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list