<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-7">
<META content="MSHTML 6.00.2800.1400" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff size=2>Try
lowering the host max_check_attempts. When nagios detects a service
is bad, it'll hostcheck each parent up the tree and will not do
ANYTHING for the 30 check attempts you've set while it tries to determine
whether RT1, RT2, and/or RT3 is down. This can adversely affect your other
monitored devices if those links are always flapping. It's better to
monitor faster and make notifications slower than to slow down the
entire monitoring. <SPAN class=806084508-08042004><FONT face=Arial
color=#0000ff size=2>The host will show up in the console as up/down/flapping a
lot, which is its true state. You can artificially slow down
notifications by using escalations.</FONT></SPAN></FONT></SPAN></DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff size=2>For
example:</FONT></SPAN></DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff size=2>set
notification interval to 5</FONT></SPAN></DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff size=2>set no
contact for the normal notification (use the escalation
instead)</FONT></SPAN></DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff size=2>set
the escalation to notify starting at alert #2</FONT></SPAN></DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff size=2>This
would in effect make it so the device would have to be down for a full 5 minutes
before you get notified.</FONT></SPAN></DIV>
<DIV><SPAN class=806084508-08042004><FONT face=Arial color=#0000ff
size=2></FONT></SPAN> </DIV>
<DIV><FONT face=Tahoma><FONT size=2><SPAN class=806084508-08042004><FONT
face=Arial color=#0000ff></FONT></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Tahoma><FONT size=2><SPAN
class=806084508-08042004></SPAN></FONT></FONT> </DIV>
<DIV><FONT face=Tahoma><FONT size=2><SPAN
class=806084508-08042004> </SPAN>-----Original Message-----<BR><B>From:</B>
Anastasios Zafeiropoulos [mailto:mls@freemail.gr]<BR><B>Sent:</B> Wednesday,
April 07, 2004 12:59 PM<BR><B>To:</B> nagios-users<BR><B>Subject:</B>
[Nagios-users] Dependency problem<BR><BR></DIV></FONT></FONT>
<BLOCKQUOTE>
<DIV><FONT face=Arial size=2>Hello world,</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>I'm having trouble with a Host dependency
misconfiguration or why not, with a bug in Nagios' Dependency logic process
and </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>notification.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>I am using version nagios-1.2-0.rhfc1.dag which
was a prebuilt package from Dag Apt repository
site.<BR>===================================================<BR>My
Topology:<BR>===================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>Nagios machine --- RT1 -- RT2 -- RT3
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV><FONT size=2>
<DIV><FONT face=Arial></FONT><BR><FONT
face=Arial>====================================================<BR>The
problem<BR>====================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>When RT1 goes down, or the RT1-RT2 Link goes down,
Nagios will notice that at random, while he is checkong a service or
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>HOST_ALIVE function to any part of the network that is
down. Let's assume that the first Host that Nagios found dead was RT3.
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Nagios didn't get any reply from RT3, so RT3 will be
kept in SOFT down state. </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Next the RETRY proccess will take place. The
max_check_attempts are 30 for each host. That's because the links are not
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>reliable at all so we want to be a little elastic with
the Notifications.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>At the time that we reach the Retry #30, Nagios assumes
that RT3 IS DOWN, puts it in HARD DOWN state and looks to find any
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>dependencies associated with the RT3. If you look below,
RT3 is dependent upon RT2. So it will continue with try pinging
RT2.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>While Nagios is trying to determine whether the RT2 is
alive or not, suddendly, the RT1-RT2 link comes up and all the network
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>is now reachable by Nagios. I notice here that the
max_checks_attempts havent timed out. So Nagios will take a response from
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>RT2 and it will put it in A HARD OK State.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>The result will be NOT to check RT3 again to see if he
is up as RT2. So, a notification will be sent reporting that RT3 is
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>down. This is FAKE. The whole network was
down!</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Below I provide you my configuration. Maybe sth goes
wrong with my conf files.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Thanks in advance guys</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT
face=Arial>====================================================<BR>My
dependecies.cfg
file<BR>====================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>define
hostdependency{<BR> host_name RT2<BR> dependent_host_name RT3<BR> notification_failure_criteria d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>define
hostdependency{<BR> host_name RT1<BR> dependent_host_name RT2<BR> notification_failure_criteria d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><BR><FONT
face=Arial>===================================================<BR>My
hosts.cfg<BR>===================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>define
host{<BR> use generic-host<BR> host_name RT1<BR> alias Wireless
1<BR> address 213.5.0.34<BR> check_command check-host-alive<BR> max_check_attempts
30<BR> notification_interval 0<BR> notification_period 24x7<BR> notification_options d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><BR><FONT face=Arial>define
host{<BR> use generic-host<BR> host_name RT2<BR> alias tsapi.twmn<BR> address 10.107.13.1<BR> parents RT1<BR> check_command check-host-alive<BR> max_check_attempts
30<BR> notification_interval 0<BR> notification_period 24x7<BR> notification_options d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><BR><FONT face=Arial>define
host{<BR> use generic-host<BR> host_name RT3<BR> alias Wireless
Internet<BR> address 212.34.23.4<BR> parents RT2<BR> check_command check-host-alive<BR> max_check_attempts
30<BR> notification_interval 0<BR> notification_period 24x7<BR> notification_options d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT
face=Arial></FONT></FONT> </DIV><BR>____________________________________________________________________<BR>http://www.freemail.gr
- δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου.<BR>http://www.freemail.gr - free
email service for the Greek-speaking.<BR></BLOCKQUOTE></BODY></HTML>