<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7226.0">
<TITLE>RE: [Nagios-users] Service checks and retry check interval</TITLE>
</HEAD>
<BODY>
<DIV id=idOWAReplyText94205 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>I had changed the 10 retries
to 5 after I grabbed the copy of the status. I did reload Nagios so that's
just an old capture.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2>I think I understand what you mean about
performing the host check and bypassing a service check, but it seems a
retry_check_interval value is not allowed in the
hosts.cfg </FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial
size=2>---------------services.cfg------------------</FONT></DIV>
<DIV dir=ltr><FONT face=Arial
size=2>--------------------------------------------------</FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>define
service{<BR>
use
generic-service ; Name of
service template to use<BR>
host_name Test-Server<BR>
service_description
PING<BR>
is_volatile
0<BR>
check_period
workhours<BR>
max_check_attempts 5<BR>
normal_check_interval
5<BR>
retry_check_interval
1<BR>
contact_groups
test-contact<BR>
notification_interval
960<BR>
notification_period
workhours<BR>
notification_options
c,r<BR>
check_command
check_fping!50%!100%<BR>
}</FONT></DIV>
<DIV dir=ltr><FONT face=Arial
size=2>--------------------------------------------------</FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000
size=2>------------------hosts.cfg-------------------</FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>define
host{<BR>
use
generic-host ;
Name of host template to use<BR>
parents switch1<BR>
host_name Test-Server<BR>
alias
TestServer<BR>
address
10.0.0.21<BR>
check_command
check-host-alive<BR>
max_check_attempts
5<BR>
notification_interval
30<BR>
notification_period
24x7<BR>
notification_options
d,u,r<BR>
}<BR>----------------------------------------------------</DIV></FONT><FONT
face=Arial color=#000000 size=2></FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>
<DIV dir=ltr><BR></DIV></FONT></DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> nagios-users-admin@lists.sourceforge.net
on behalf of Marc Powell<BR><B>Sent:</B> Wed 6/16/2004 5:42 PM<BR><B>To:</B> Tom
Valdes; nagios-users@lists.sourceforge.net<BR><B>Subject:</B> RE: [Nagios-users]
Service checks and retry check interval<BR></FONT><BR></DIV>
<DIV>
<P><FONT size=2><BR><BR>________________________________<BR><BR>>From: Tom
Valdes [<A
href="mailto:Tom.Valdes@flamenconetworks.com">mailto:Tom.Valdes@flamenconetworks.com</A>]<BR>>Sent:
Wednesday, June 16, 2004 2:55 PM<BR>>To:
nagios-users@lists.sourceforge.net<BR>>Subject: [Nagios-users] Service checks
and retry check interval<BR><BR>> I currently have my normal_check_interval
set to 5 minutes<BR><BR>> If a service check is missed, I'd like it to retry
5<BR>> times before sending a notification and I'd like the<BR>> retry
interval to be 1 minute. (can it be less? <BR>> Like 10
seconds?)<BR><BR>>I've tried adding the following to
services.cfg<BR><BR>>
max_check_attempts
5<BR>>
normal_check_interval
5<BR>>
retry_check_interval
1<BR><BR>I presume this is for the service definition. Can we see the
complete<BR>definition?<BR><BR>> Shouldn't this retry a failed check every
minute<BR>> for 5 tries before sending a notification?<BR><BR>For the service
above under normal circumstances, yes. I use 5,5,3 to<BR>delay notifications by
~15 minutes.<BR><BR>> Using a test server, I pull the plug and Nagios<BR>>
catches the 100% ping loss but if I plug it back<BR>> in as soon as it
notices, Nagios emails me right<BR>> away and doesn't return an Up state for
another<BR>> 5 minutes?<BR><BR>For the service or the host? See
below.<BR><BR>> The following is what I receive on the status<BR>>
screen.. It shows a State Type: HARD.. Shouldn't<BR>> it be in a SOFT state
until it completes the<BR>> max_check_attempts?<BR><BR>> Current
Status: CRITICAL <BR>> Status Information:FPING
CRITICAL - 192.168.100.21 (loss=100.000000% )<BR>> Current
Attempt:1/10<BR><BR>Why is max attempts showing 10 here if it's defined as 5
above? Did you<BR>restart nagios after making the change? Do you have multiple
nagios<BR>processing running?<BR><BR>There is a special situation that results
when you just 'pull the plug'<BR>on a machine you're monitoring. The service
check will of course fail on<BR>the first attempt. Nagios will then attempt to
check the status of the<BR>host using the host check_command. It will do this
exclusively until<BR>max_check_attempts defined for the host is reached and will
not attempt<BR>to recheck the status of the service if the host is determined to
be<BR>down or unreachable. At that point nagios will attempt to send a
HOST<BR>down notification which may be what you are seeing. Because of
this<BR>special situation, your retry_check_interval for the service has
no<BR>meaning. AFAIK, nagios just falls back to normal_check_interval
until<BR>one or more services on the host recovers (and the host by
inference).<BR><BR>--<BR>Marc<BR><BR><BR>-------------------------------------------------------<BR>This
SF.Net email is sponsored by The 2004 JavaOne(SM) Conference<BR>Learn from the
experts at JavaOne(SM), Sun's Worldwide Java Developer<BR>Conference, June 28 -
July 1 at the Moscone Center in San Francisco, CA<BR>REGISTER AND SAVE! <A
href="http://java.sun.com/javaone/sf">http://java.sun.com/javaone/sf</A>
Priority Code
NWMGYKND<BR>_______________________________________________<BR>Nagios-users
mailing list<BR>Nagios-users@lists.sourceforge.net<BR><A
href="https://lists.sourceforge.net/lists/listinfo/nagios-users">https://lists.sourceforge.net/lists/listinfo/nagios-users</A><BR>:::
Please include Nagios version, plugin version (-v) and OS when reporting any
issue.<BR>::: Messages without supporting info will risk being sent to
/dev/null<BR></FONT></P></DIV>
</BODY>
</HTML>