Problem with time between soft down checks
Josh Van As
JVanas at finncorp.com
Thu Mar 18 15:00:29 CET 2004
We just installed Nagios 1.2 as an upgrade to 1.1. We had the same
problem I am about to describe in 1.1, and was hoping that 1.2 fixed it.
It did not.
Our desired behavior is that when a service or host soft fails, we want
Nagios to wait 1 minute then re-check. Repeat this a total of 5 failed
checks (5th one being HARD) before sending out notification.
The problem we are having, as you can se from the sample below, is that
Nagios is only waiting 3 seconds in-between soft fail checks. Instead
of a host / service taking 4 minutes to fail 4 additional times (before
notification) it only takes about 12 seconds.
We are getting a lot of false pages because just about any network
glitch can last 12 seconds.
Has anyone seen this before? Can you please help! We love this
product, but this is driving us crazy with pages! Is this a problem
with our perl installation? Are we missing a module or something? Or
do we have the config files setup wrong?
TIA!
-Josh
Sample problem:
[03-18-2004 08:44:02] HOST NOTIFICATION:
rich;fcprt0013;DOWN;host-notify-by-epager;/bin/ping -n -U -c 1
172.16.1.86
[03-18-2004 08:44:02] HOST ALERT: fcprt0013;DOWN;HARD;5;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:59] HOST ALERT: fcprt0013;DOWN;SOFT;4;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:56] HOST ALERT: fcprt0013;DOWN;SOFT;3;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:53] HOST ALERT: fcprt0013;DOWN;SOFT;2;/bin/ping -n -U
-c 1 172.16.1.86
[03-18-2004 08:43:50] HOST ALERT: fcprt0013;DOWN;SOFT;1;/bin/ping -n -U
-c 1 172.16.1.86
Here is the service definition for this service:
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 0
check_freshness 1
freshness_threshold 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 0
max_check_attempts 5
normal_check_interval 1
retry_check_interval 1
check_period 24x7
notification_interval 60
notification_period wakinghours
notification_options w,c,r
register 0
}
define service{
use generic-service
host_name fcprt0013
service_description ping
check_command check-host-alive
contact_groups finncontacts
}
Here is the host definition for this host:
define host{
name generic-host
checks_enabled 1
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
low_flap_threshold 0
high_flap_threshold 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 0
max_check_attempts 5
notification_interval 60
notification_period wakinghours
notification_options d,r
register 0
}
define host{
use generic-host
host_name fcprt0013
alias fcprt0013.finncorp.com
address 172.16.1.86
check_command check-host-alive
parents fcnet0007
}
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list