Group monitoring

Marc Powell marc at ena.com
Tue May 27 18:26:25 CEST 2008


On May 27, 2008, at 10:02 AM, Germán Gutiérrez wrote:

> I think I'm not the only one with this issue, but I couldn't find any
> documented solution.
>
> We have a group of servers, sometimes, for a common reason, a service
> goes down almost simultaneously and we get around 30 alerts about the
> same thing.
>

> Any thoughts? Links? Clues? RTFM?

Simplest thing seems to be to monitor that thing that's breaking and  
use service dependencies to make the services above dependent on the  
newly monitored service.

If you can't monitor that thing, it's a bit more complicated. You want  
to normally receive notifications for the service unless some certain  
threshold count of them is reached. check_cluster could be useful here  
by making all the services above dependent on a cluster service check.  
If you set the check cluster threshold to say 5, I'd expect that you'd  
receive at most 5(ish) notifications (4 for per-service notifications  
+ 1 for check_cluster itself).

--
Marc
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list