Group monitoring
Marc Powell
marc at ena.com
Tue May 27 18:26:25 CEST 2008
On May 27, 2008, at 10:02 AM, Germán Gutiérrez wrote:
> I think I'm not the only one with this issue, but I couldn't find any
> documented solution.
>
> We have a group of servers, sometimes, for a common reason, a service
> goes down almost simultaneously and we get around 30 alerts about the
> same thing.
>
> Any thoughts? Links? Clues? RTFM?
Simplest thing seems to be to monitor that thing that's breaking and
use service dependencies to make the services above dependent on the
newly monitored service.
If you can't monitor that thing, it's a bit more complicated. You want
to normally receive notifications for the service unless some certain
threshold count of them is reached. check_cluster could be useful here
by making all the services above dependent on a cluster service check.
If you set the check cluster threshold to say 5, I'd expect that you'd
receive at most 5(ish) notifications (4 for per-service notifications
+ 1 for check_cluster itself).
--
Marc
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list