<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN"> <HTML> <HEAD> <META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0"> <TITLE>RE: [Nagios-users] Service checks and retry check interval</TITLE> </HEAD> <BODY> <DIV id=idOWAReplyText94205 dir=ltr> <DIV dir=ltr>I had changed the 10 retries to 5 after I grabbed the copy of the status. I did reload Nagios so that's just an old capture.</DIV> <DIV dir=ltr>I think I understand what you mean about performing the host check and bypassing a service check, but it seems a retry_check_interval value is not allowed in the hosts.cfg </DIV> <DIV dir=ltr> </DIV> <DIV dir=ltr>---------------services.cfg------------------</DIV> <DIV dir=ltr>--------------------------------------------------</DIV> <DIV dir=ltr>define service{ use generic-service ; Name of service template to use host_name Test-Server service_description PING is_volatile 0 check_period workhours max_check_attempts 5 normal_check_interval 5 retry_check_interval 1 contact_groups test-contact notification_interval 960 notification_period workhours notification_options c,r check_command check_fping!50%!100% }</DIV> <DIV dir=ltr>--------------------------------------------------</DIV> <DIV dir=ltr>------------------hosts.cfg-------------------</DIV> <DIV dir=ltr>define host{ use generic-host ; Name of host template to use parents switch1 host_name Test-Server alias TestServer address 10.0.0.21 check_command check-host-alive max_check_attempts 5 notification_interval 30 notification_period 24x7 notification_options d,u,r } ----------------------------------------------------</DIV></DIV> <DIV dir=ltr> <DIV dir=ltr> </DIV></DIV> <DIV dir=ltr> <HR tabIndex=-1> From: nagios-users-admin@lists.sourceforge.net on behalf of Marc Powell Sent: Wed 6/16/2004 5:42 PM To: Tom Valdes; nagios-users@lists.sourceforge.net Subject: RE: [Nagios-users] Service checks and retry check interval </DIV> <DIV> ________________________________ >From: Tom Valdes [<A href="mailto:Tom.Valdes@flamenconetworks.com">mailto:Tom.Valdes@flamenconetworks.com</A>] >Sent: Wednesday, June 16, 2004 2:55 PM >To: nagios-users@lists.sourceforge.net >Subject: [Nagios-users] Service checks and retry check interval > I currently have my normal_check_interval set to 5 minutes > If a service check is missed, I'd like it to retry 5 > times before sending a notification and I'd like the > retry interval to be 1 minute. (can it be less? > Like 10 seconds?) >I've tried adding the following to services.cfg > max_check_attempts 5 > normal_check_interval 5 > retry_check_interval 1 I presume this is for the service definition. Can we see the complete definition? > Shouldn't this retry a failed check every minute > for 5 tries before sending a notification? For the service above under normal circumstances, yes. I use 5,5,3 to delay notifications by ~15 minutes. > Using a test server, I pull the plug and Nagios > catches the 100% ping loss but if I plug it back > in as soon as it notices, Nagios emails me right > away and doesn't return an Up state for another > 5 minutes? For the service or the host? See below. > The following is what I receive on the status > screen.. It shows a State Type: HARD.. Shouldn't > it be in a SOFT state until it completes the > max_check_attempts? > Current Status: CRITICAL > Status Information:FPING CRITICAL - 192.168.100.21 (loss=100.000000% ) > Current Attempt:1/10 Why is max attempts showing 10 here if it's defined as 5 above? Did you restart nagios after making the change? Do you have multiple nagios processing running? There is a special situation that results when you just 'pull the plug' on a machine you're monitoring. The service check will of course fail on the first attempt. Nagios will then attempt to check the status of the host using the host check_command. It will do this exclusively until max_check_attempts defined for the host is reached and will not attempt to recheck the status of the service if the host is determined to be down or unreachable. At that point nagios will attempt to send a HOST down notification which may be what you are seeing. Because of this special situation, your retry_check_interval for the service has no meaning. AFAIK, nagios just falls back to normal_check_interval until one or more services on the host recovers (and the host by inference). -- Marc ------------------------------------------------------- This SF.Net email is sponsored by The 2004 JavaOne(SM) Conference Learn from the experts at JavaOne(SM), Sun's Worldwide Java Developer Conference, June 28 - July 1 at the Moscone Center in San Francisco, CA REGISTER AND SAVE! <A href="http://java.sun.com/javaone/sf">http://java.sun.com/javaone/sf</A> Priority Code NWMGYKND _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net <A href="https://lists.sourceforge.net/lists/listinfo/nagios-users">https://lists.sourceforge.net/lists/listinfo/nagios-users</A> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null </DIV> </BODY> </HTML>