Monitoring large (ish) numbers of servers with exceptions to the rules...
Wheeler, JF (Jonathan)
J.F.Wheeler at rl.ac.uk
Tue Jun 17 14:22:58 CEST 2008
> -----Original Message-----
> From: nagios-users On Behalf Of Matthew Macdonald-Wallace
> Sent: 17 June 2008 13:14
>
> I currently help maintain and monitor around 50 servers across various
> parts of the UK using Nagios 2. At the moment, we have a
configuration
> file for each host (%hostname%.cfg) and in that file we specify all
the
> services for the named host.
>
> We are trying to reduce the number of configuration files as we take
on
> more and more servers because there are a large number checks that we
> need to be rolled out to all servers and we feel that we are
> duplicating our workload.
>
> I'm open to ideas on how to achieve this however my thoughts were a
> setup along the lines of the following:
>
> - A "master" host template is created in which all services are
defined
> for a host.
>
> - If a check does not need to be run for a given host (for example it
> is not a web server), a stanza is added to that particular host's
> config file that effectively tells nagios "don't check for this
> service on this host"
>
> I've tried defining all the services in a master templates file and
> this works perfectly however when I come to exclude certain services,
I
> am at a loss on how to do it.
>
> Initially I tried adding a stanza with the same service name and
> "register 0" as one of the options, however this didn't work.
>
> We have used HostGroups in the past to achieve a similar goal, however
> we ran into the issue that whilst we need to check the CPU Usage on
all
> of the servers, a few of the servers that we monitor can take a lot
> more of a beating than the majority. This lead to us defining the CPU
> checks on a per-host basis as if we defined it separately from the
> hostgroup for the more powerful servers we presented with a load of
> errors regarding duplicate service names.
>
> I hope I've made myself clear on what we're after and I look forward
to
> receiving your input on this.
One thing that I use in the configuration that I maintain is to have
something like this:
define service{
use generic-hung-mounts
hostgroup_name experiments
hosts !lfc0448
contact_groups experiments
}
where "lcg0448" is a host in host group "experiments" and I want to
apply the "generic-hung-mounts" check to all hosts in that group except
for "lcg0448".
This can lead to configuration like this:
define service{
use check-pbs-offline
hostgroup_name workers
hosts !lcg0614,!lcg0617,!lcg0618,!lcg0626
contact_groups tier1a
}
define service{
use check-pbs-offline
hosts lcg0614,lcg0617,lcg0618,lcg0626
contact_groups tier1a,grid-team
}
where the only difference is that the hosts in the second definition
have a second contact group.
HTH
Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list