Best way to monitor application clusters

Hari Sekhon hpsekhon at googlemail.com
Mon Sep 24 18:22:13 CEST 2007


In the contrib directory of the standard plugins distribution there 
should be a check_cluster and better check_cluster2.c which you can 
compile to check a bunch of hosts or services and then go to warning or 
critical state only if a certain number of them fail.

Not used it myself, but it seems like exactly what you are looking for.

-h

Hari Sekhon



Paul Weaver wrote:
>
> I've recently started using nagios in our development environment, and 
> have knocked a few plugins for some of our programs (i.e. monitor a 
> log on a remote server to make sure it's growing, but not growing too 
> fast or too slow, or jumbo pings between two remote machines), which 
> is very impressive.
>
> One thing I would like to monitor is a group of hosts/services, and 
> flag a warning if x% are not available, and a critical if y% are 
> offline. A common example would be checking DNS services. If you have 
> 4 DNS servers, you don't want to be woken up at 3AM if one falls 
> offline, but if 3 are offline you would, and if 4 are offline you want 
> an APB. You still want to see the servers are offline though on a 
> webpage, and possible a notification in work hours.
>
> I'm aware of host/service groups, being one way of doing it, however 
> I'm unsure if notifications can be set based on % of hosts/services 
> available in a group.
>
> Another way would be a "virtual host", with a custom 
> "check_host_alive" which checks all hosts in a collection, and returns 
> an OK/critical/warning based on the number of failures, and likewise 
> with "virtual services". The original hosts could then be monitored 
> separately, or even not at all.
>
> For example, a service I would like to check is whether 3 mysql 
> databases are in sync with each other. I currently have a web page 
> that compares the log positions. It seems to me that logically the 
> service should run on the mysql boxes, however I only want it running on
>
> Another example would be I have a piece of java software (call it "A") 
> that must run on at least one of 4 machines, and preferably on 2 of 
> them. I don't care which machine it's on, but if it's not running I 
> want to be notified in red lights.
>
> I could have a simple "virtual service A", which would critical on 0, 
> warn on 1 and OK on 2 or more.
> This would be attached to "virtual host A", which would critical on 0, 
> warn on 1 and OK on 2 or more of the servers that the service runs on.
>
> I'd also like a "simple" login to the web page which would only 
> display the "clusters" of services/hosts, rather than the total view, 
> which would allow our support engineers to easilly see real problems, 
> and allow management to sleep hapilly with lots of green lights.
>
> I must admit I'm leaning to the virtual host/service thing, but I was 
> wondering if there's a standard/better way of monitoring these kind of 
> things?
>
> Thanks
>
>
> http://www.bbc.co.uk
> This e-mail (and any attachments) is confidential and may contain 
> personal views which are not the views of the BBC unless specifically 
> stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in 
> reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list