Nagios acknowledgement enhancement request... revisited

Andreas Ericsson ae at op5.se
Thu Nov 20 10:09:21 CET 2008


Jim Winkle wrote:
> I wrote last week suggesting an enhancement to Nagios so it would 
> send notifications *even if* a problem had been acknowledged, provided
> the error string returned by the plugin had changed.  At the time, I 
> was talking about the check_logfiles plugin, but I realize that at
> least one core plugin would benefit from this as well.
> 
> The check_disk plugin is one such example.
> 
> Right now, even using state staulking and configuring check_disk as 
> a volatile service, Nagios will only notify about the *first* full 
> disk if a user acknowledges the problem. If a second disk fills later, 
> we won't be notified.
> 

It's possible to monitor disks separately. For the reasons you mention
above, I always recommend doing so.

> The second full disk does show up in the web interface, but no 
> notifications are sent out. It seems like the state of the web 
> interface should match the notifications.
> 
> What if the first disk was an unimportant disk that occasionally fills
> and is no big deal so the problem was acknowledged, but the second one 
> is more important? By default, we'll get no notification for the second 
> disk.
> 

In that case, you should either not be monitoring the first disk at all,
or monitor it as a separate service that has other notification criteria
than the important disks.

> Yes, adaptive monitoring could be designed to handle this (thanks again
> for those suggestions), but why not have it work out-of-the-box?
> 
> I'll repeat the implementation suggestion that I think makes the most sense:
> 
>> If Nagios would store the string that the plugin returned
>> when a user clicks "Acknowledge", then if the plugin returns a *new*
>> CRITICAL string, Nagios would go thru it's notification routine, run event
>> handlers, etc. When the user again clicks "Acknowledge", Nagios stores this
>> new string (discarding the old) to be ready for the next problem.
> 

This is a good idea, but it needs to be configurable. Think for example of
plugins like check_icmp, where sub-millisecond precision is used and therefore
will nearly always change between invocations. You'd hardly want a notification
every 5 minutes because you still can't ping your taiwan office, do you?

> This would make it easier for people to get started with Nagios. Many people 
> may not realize that they won't be notified for subsequent failures that a 
> plugin reports, and that they have to do something special in order to get 
> notified.
> 

Well, notification floods are *worse* than failed notifications. Trust me
on this.

> Any possibility of this being implemented at some point?
> 

The second you send a patch, I'll queue it for Ethan. Personally, I always
monitor critical stuff separately (and that's what I recommend our customers
to do as well), so I don't have this problem and therefor not this itch.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/




More information about the Developers mailing list