Dependency problem
Andreas Ericsson
ae at op5.se
Wed Apr 7 22:55:32 CEST 2004
> ===================================================
> My Topology:
> ===================================================
>
> Nagios machine --- RT1 -- RT2 -- RT3
>
>
> ====================================================
> The problem
> ====================================================
>
> When RT1 goes down, or the RT1-RT2 Link goes down, Nagios will notice
> that at random, while he is checkong a service or
>
> HOST_ALIVE function to any part of the network that is down. Let's
> assume that the first Host that Nagios found dead was RT3.
>
> Nagios didn't get any reply from RT3, so RT3 will be kept in SOFT down
> state.
>
> Next the RETRY proccess will take place. The max_check_attempts are 30
> for each host. That's because the links are not
> reliable at all so we want to be a little elastic with the Notifications.
>
This is where your problem is. max_check_attempts of 30 is more than
just a little elastic. Set it to 10 or something instead, and things
might run a bit smoother.
Also, if the network really is in such a crappy state, you might want to
just stop monitoring it, since it's obviously not mission-critical for you.
> At the time that we reach the Retry #30, Nagios assumes that RT3 IS
> DOWN, puts it in HARD DOWN state and looks to find any
> dependencies associated with the RT3. If you look below, RT3 is
> dependent upon RT2. So it will continue with try pinging RT2.
>
> While Nagios is trying to determine whether the RT2 is alive or not,
> suddendly, the RT1-RT2 link comes up and all the network
>
> is now reachable by Nagios. I notice here that the max_checks_attempts
> havent timed out. So Nagios will take a response from
>
> RT2 and it will put it in A HARD OK State.
>
> The result will be NOT to check RT3 again to see if he is up as RT2. So,
> a notification will be sent reporting that RT3 is
>
> down. This is FAKE. The whole network was down!
>
> Below I provide you my configuration. Maybe sth goes wrong with my conf
> files.
>
> Thanks in advance guys
>
> ====================================================
> My dependecies.cfg file
> ====================================================
>
> define hostdependency{
> host_name RT2
> dependent_host_name RT3
> notification_failure_criteria d,u
> }
>
> define hostdependency{
> host_name RT1
> dependent_host_name RT2
> notification_failure_criteria d,u
> }
>
>
> ===================================================
> My hosts.cfg
> ===================================================
>
> define host{
> use generic-host
> host_name RT1
> alias Wireless 1
> address 213.5.0.34
> check_command check-host-alive
> max_check_attempts 30
> notification_interval 0
> notification_period 24x7
> notification_options d,u
> }
>
>
> define host{
> use generic-host
> host_name RT2
> alias tsapi.twmn
> address 10.107.13.1
> parents RT1
> check_command check-host-alive
> max_check_attempts 30
> notification_interval 0
> notification_period 24x7
> notification_options d,u
> }
>
>
> define host{
> use generic-host
> host_name RT3
> alias Wireless Internet
> address 212.34.23.4
> parents RT2
> check_command check-host-alive
> max_check_attempts 30
> notification_interval 0
> notification_period 24x7
> notification_options d,u
> }
>
>
>
> ____________________________________________________________________
> http://www.freemail.gr - δωρεάν υπηρεσία ηλεκτρονικού ταχυδρομείου.
> http://www.freemail.gr - free email service for the Greek-speaking.
--
Mvh
Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list