<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-7">
<META content="MSHTML 6.00.2800.1400" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hello world,</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>I'm having trouble with a Host dependency
misconfiguration or why not, with a bug in Nagios' Dependency logic process and
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>notification.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>I am using version nagios-1.2-0.rhfc1.dag which was
a prebuilt package from Dag Apt repository
site.<BR>===================================================<BR>My
Topology:<BR>===================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial size=2>Nagios machine --- RT1 -- RT2 -- RT3 </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV><FONT size=2>
<DIV><FONT face=Arial></FONT><BR><FONT
face=Arial>====================================================<BR>The
problem<BR>====================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>When RT1 goes down, or the RT1-RT2 Link goes down, Nagios
will notice that at random, while he is checkong a service or </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>HOST_ALIVE function to any part of the network that is
down. Let's assume that the first Host that Nagios found dead was RT3.
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Nagios didn't get any reply from RT3, so RT3 will be kept
in SOFT down state. </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Next the RETRY proccess will take place. The
max_check_attempts are 30 for each host. That's because the links are not
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>reliable at all so we want to be a little elastic with the
Notifications.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>At the time that we reach the Retry #30, Nagios assumes
that RT3 IS DOWN, puts it in HARD DOWN state and looks to find any </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>dependencies associated with the RT3. If you look below,
RT3 is dependent upon RT2. So it will continue with try pinging
RT2.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>While Nagios is trying to determine whether the RT2 is
alive or not, suddendly, the RT1-RT2 link comes up and all the network
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>is now reachable by Nagios. I notice here that the
max_checks_attempts havent timed out. So Nagios will take a response from
</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>RT2 and it will put it in A HARD OK State.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>The result will be NOT to check RT3 again to see if he is
up as RT2. So, a notification will be sent reporting that RT3 is </FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>down. This is FAKE. The whole network was
down!</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Below I provide you my configuration. Maybe sth goes wrong
with my conf files.</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>Thanks in advance guys</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>====================================================<BR>My
dependecies.cfg
file<BR>====================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>define
hostdependency{<BR> host_name RT2<BR> dependent_host_name RT3<BR> notification_failure_criteria d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>define
hostdependency{<BR> host_name RT1<BR> dependent_host_name RT2<BR> notification_failure_criteria d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><BR><FONT
face=Arial>===================================================<BR>My
hosts.cfg<BR>===================================================</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial>define
host{<BR> use generic-host<BR> host_name RT1<BR> alias Wireless
1<BR> address 213.5.0.34<BR> check_command check-host-alive<BR> max_check_attempts
30<BR> notification_interval 0<BR> notification_period 24x7<BR> notification_options d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><BR><FONT face=Arial>define
host{<BR> use generic-host<BR> host_name RT2<BR> alias tsapi.twmn<BR> address 10.107.13.1<BR> parents RT1<BR> check_command check-host-alive<BR> max_check_attempts
30<BR> notification_interval 0<BR> notification_period 24x7<BR> notification_options d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><BR><FONT face=Arial>define
host{<BR> use generic-host<BR> host_name RT3<BR> alias Wireless
Internet<BR> address 212.34.23.4<BR> parents RT2<BR> check_command check-host-alive<BR> max_check_attempts
30<BR> notification_interval 0<BR> notification_period 24x7<BR> notification_options d,u<BR> }</FONT></DIV>
<DIV><FONT face=Arial></FONT> </DIV>
<DIV><FONT face=Arial></FONT></FONT> </DIV></BODY></HTML>
<BR>
____________________________________________________________________<BR>
http://www.freemail.gr - δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου.<BR>
http://www.freemail.gr - free email service for the Greek-speaking.<BR>