Nagios acknowledgement enhancement request... revisited
Andreas Ericsson
ae at op5.se
Thu Nov 20 10:09:21 CET 2008
Jim Winkle wrote:
> I wrote last week suggesting an enhancement to Nagios so it would
> send notifications *even if* a problem had been acknowledged, provided
> the error string returned by the plugin had changed. At the time, I
> was talking about the check_logfiles plugin, but I realize that at
> least one core plugin would benefit from this as well.
>
> The check_disk plugin is one such example.
>
> Right now, even using state staulking and configuring check_disk as
> a volatile service, Nagios will only notify about the *first* full
> disk if a user acknowledges the problem. If a second disk fills later,
> we won't be notified.
>
It's possible to monitor disks separately. For the reasons you mention
above, I always recommend doing so.
> The second full disk does show up in the web interface, but no
> notifications are sent out. It seems like the state of the web
> interface should match the notifications.
>
> What if the first disk was an unimportant disk that occasionally fills
> and is no big deal so the problem was acknowledged, but the second one
> is more important? By default, we'll get no notification for the second
> disk.
>
In that case, you should either not be monitoring the first disk at all,
or monitor it as a separate service that has other notification criteria
than the important disks.
> Yes, adaptive monitoring could be designed to handle this (thanks again
> for those suggestions), but why not have it work out-of-the-box?
>
> I'll repeat the implementation suggestion that I think makes the most sense:
>
>> If Nagios would store the string that the plugin returned
>> when a user clicks "Acknowledge", then if the plugin returns a *new*
>> CRITICAL string, Nagios would go thru it's notification routine, run event
>> handlers, etc. When the user again clicks "Acknowledge", Nagios stores this
>> new string (discarding the old) to be ready for the next problem.
>
This is a good idea, but it needs to be configurable. Think for example of
plugins like check_icmp, where sub-millisecond precision is used and therefore
will nearly always change between invocations. You'd hardly want a notification
every 5 minutes because you still can't ping your taiwan office, do you?
> This would make it easier for people to get started with Nagios. Many people
> may not realize that they won't be notified for subsequent failures that a
> plugin reports, and that they have to do something special in order to get
> notified.
>
Well, notification floods are *worse* than failed notifications. Trust me
on this.
> Any possibility of this being implemented at some point?
>
The second you send a patch, I'll queue it for Ethan. Personally, I always
monitor critical stuff separately (and that's what I recommend our customers
to do as well), so I don't have this problem and therefor not this itch.
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
More information about the Developers
mailing list