Notification blackout ignored (repost)

Marc Powell marc at ena.com
Wed Jan 14 16:32:31 CET 2004



> -----Original Message-----
> From: Martin, Jeremy [mailto:jmartin at gsi-kc.com]
> Sent: Wednesday, January 14, 2004 12:33 AM
> To: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Notification blackout ignored (repost)
> 
> (RedHat 9, Nagios 1.1, Nagios-plugins 1.3.1
> 
> -----Original Message-----
> From: Martin, Jeremy
> Sent: Tuesday, January 13, 2004 5:09 AM
> To: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Notification blackout ignored
> 
> From: Andreas Koch [mailto:a.koch at eurodata.de]
> > which you do, are a little too exaggerated.
> > The better option are:
> > max_check_attempts              5
> > retry_check_interval            1
> > this causes when the nagios server notice that the Service is down
he
> > check the Service 5 (max_check_attempts) times all 1 minute.
> > thus, if the service 5 minutes long down is, only one informs.
> > When the Service with the 3 check again ok is, is not informed.
> 
> Hi,
> 
> I am already trying to use max_check_attempts 5 and
retry_check_interval
> 1 for these checks. Unfortunately it is still sending us
warnings/pages
> every night despite these settings. In fact on some I am even trying
> max_check_attempts 10 but it's still not working.

These settings actually have no bearing on the time period during which
you get notified, only how long a host must be down before a
notification is attempted based on a number of factors
(http://nagios.sourceforge.net/docs/1_0/notifications.html). Max 5
checks at 1 minute intervals = 5 minutes down before notification, 10
checks at 1 minute intervals = 10 minutes down before notification, 3
checks at 5 minute intervals = 15 minutes down, etc.
 
> It seems when it gets a "connection refused by host" Nagios ignores
> retry_check_interval and keeps retrying 5 or 10 times in the same
> second. Is there any way to change this behavior? Here is an example
> from the event log:

This is an accurate statement for host checks. Host checks in Nagios are
very aggressive. Notice that you can not specify a retry_check_interval
in a host definition. This is because Nagios must definitively determine
the status of the host if a service on that host fails
(http://nagios.sourceforge.net/docs/1_0/networkreachability.html).
Everything else stops while this process is happening (all other service
checks, etc) so it's important that it finishes as quickly as possible.
You can assist this by lowering the number of check attempts that must
fail for a host before it's determined to be down. There are other
modifications as well but I leave that as an exercise for you to search
the docs/archives. Again, this really has no bearing on the time period
a notification gets sent out.
 
> [01-13-2004 04:46:07] HOST NOTIFICATION:
> Admin;www.whatever.com;DOWN;host-notify-by-email;Connection refused by
> host
> [01-13-2004 04:46:06] HOST NOTIFICATION:
> Admin;www.whatever.com;DOWN;host-notify-by-epager;Connection refused
by
> host

These notifications appear to have come after your 4:00-4:30 blackout.
Is that not correct? If it's still an issue, you might want to verify
that you don't have an old Nagios process still running and notifying on
the old schedule.

--
Marc



-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list