Nagios retries checks too soon.

Jochen Bern Jochen.Bern at LINworks.de
Fri Jun 10 09:53:59 CEST 2011


On 06/09/2011 08:14 PM, Paul M. Dubuc wrote:
> Andreas Ericsson wrote:
>> I'm not sure. I'm also not sure which behaviour is intended. Arguably, either
>> is correct and Nagios is doing one of two right things.
> I'm not sure.  If a test times out and Nagios tries again 10 seconds later 
> instead of the 60 seconds specified, that could cause problems; load related 
> problems when you have many of these tests running and timing out and problems 
> for the system under test not having sufficient time to recover before the 
> next check is done.

True, but *if* someone has the latter kind of problem, I'd expect him to
keep it in mind while writing the configuration, too. IIRC, the actual
code adds check_interval/retry_interval to the variable that holds the
(previous) scheduled check time - i.e., the time when the previous check
supposedly was *started* (assuming negligible check latency).
Configuring a retry_interval of one minute for a service whose sustained
request rate may be *less* than one per minute sounds dubitable to me.

(And I'm a firm nonbeliever in Unix-ish "load" figures, as opposed to
actual CPU usage etc., but that's a different rant.)

Kind regards,
								J. Bern
-- 
Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev




More information about the Users mailing list