max_check_attempts, retry_check_interval, and notifications: confusion
John P. Rouillard
rouilj at cs.umb.edu
Mon Mar 6 02:09:22 CET 2006
In message <20060305051646.GB3860 at think.alaya.net>,
prosolutions at gmx.net writes:
>I am trying to configure the following behavior from nagios:
>1. check a service every normal_check_interval
>2. if service check fails, up the check rate to retry_check_interval
>3. if 2 successive service checks fail, send notification
>4. continue to check at retry_check_interval until service check
> succeeds and send notification
Yup 4 is the tough one.
>my understanding is that once max_check_attempts is reached the service
>check interval returns to normal_check_interval even if the service
>is still down.
Correct.
>but this does not make sense to me.
The idea for the shortened retry check interval is to allow faster
checks while the service is in a soft error state. This way you can
run multiple (soft) checks within the time it would take to perform a
single normal check. Once the hard state resumes, the "normal" check
interval will resume. It would probably have been better if the
intervals were called "hard_check_interval" and "soft_check_interval"
rather than normal/retry since normal makes it sound like it should be
used for the "normal" state which one hopes is "ok" 8-).
>if a service is
>down - it seems logical to up the check interval and try a couple more
>checks before sending an alert. but if the service has not recovered i
>don't want the check interval to go back to normal.
You don't say what version of nagios you are using but there are a
couple of ways to handle this. I believe I saw a patch for nagios 1.x
that added another check_interval option. I want to say
error_check_interval, but that's not getting any hits on google. I
think it was on the nagios developer's list, but it could have been
nagios-users.
For nagios 2.x you can use the adaptive monitoring (see manual)
command: 'CHANGE_NORMAL_SVC_CHECK_INTERVAL:interval' to change the
interval from an event handler. I would suggest using the
objects.cache file to determine the configured normal_check_interval
and retry_check_interval. You may have to cache that info for your
event handler as I am not sure if that file is re-written when the
intervals change.
-- rouilj
John Rouillard
===========================================================================
My employers don't acknowledge my existence much less my opinions.
-------------------------------------------------------
This SF.Net email is sponsored by xPML, a groundbreaking scripting language
that extends applications into web and mobile media. Attend the live webcast
and join the prime developer group breaking into this new coding territory!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list