Redundant Switches/Routers/Network Interfaces and parent configuration question

Cook, Garry GWCOOK at mactec.com
Tue Aug 3 16:09:20 CEST 2004


nagios-users-admin at lists.sourceforge.net wrote:
> Hi all,
> 
> I've got a question/problem regarding redundant network
> connections. We
> have the setup as displayed in the graphic below (sorry for the image,
> but a picture says more than thousand words and it's easier to
> understand what i mean). 
 
I've got a network similar to that...


> As you can see, we've two redundant routers connected to the backbone
> (or internet or whatever). Connected to each router is a
> master switch.
> The frontend switches are each connected to both master switches. All
> clients have two network interfaces, each connected to one
> corresponding frontend switch. There are several virtual IPs (VIP) on
> each client (RIP/Zebra is used for routing) and different services
> running on each VIP. I added a logical client to the picture (which
> has no IP address, since it's only a concept) to clarify the
> explanations below. 

Very similar in fact. The only problem with your picture is that it does
not show the location of the Nagios box. Therefore, it is difficult to
determine if your parents are setup correctly.

 
> What we need is the following:
> 1) If anything fails, we want to get notifications for it, but
> 2) if anything blocking fails, we don't want to get notifications for
> anything below, only for the blocking parts.
> 3) Blocking means that both of the structural elements are not
> reachable/down (e.g. both routers, both master switches or both
> client interfaces and so on). 4) we want ONE entry in the
> host/services list for every host (that's why i added the "logical"
> host in the diagram), so the overview won't get huge because of every
> service is listed twice (= for each network interface of the client). 

Sounds logical. For number 1, just make sure that you have devices setup
to notify when they are in a 'Down' state. For number 2, you'll want to
NOT receive 'Unreachable' messages for devices below the 'blocking'
device.


> Actually I tried the following (i only list the relevant part of the
> entries, all hosts checked via ping, except the logical host (which
> has it's own check script to check if both interfaces are up), system
> is Debian woody with Nagios version 1.2-0 from backports.org):
> 
[snip]
> 
> We tried it and got the result that we won't get any
> notifications from
> the services if just ONE switch/router/interface is going down. So it
> looks like "parents" rules are like: "_all_ parents have to be up"
> instead of "at least _one_ of the parents has to be up" (which makes
> much more sense regarding "normal" network structures - how
> often do you
> have a client which fails if ONE of his parents fails? And no, if you
> have two switches in a row, the client would have just ONE parent,
> because the second switch has the first one as parent, no problem
> there). 

I don't understand the paragraph above. Can you elaborate?
Are you stating that if Switch 1 were to go down, you would not receive
notifications for service failures on Switch 3? If so, then something is
not working correctly. Since Switch2 is a parent of Switch3, you should
still see alerts/notifications when Switch3 has problems.
In my experience, Nagios does NOT work as you have stated. The 'parents'
rules work in such a way that just ONE of the parents needs to be up in
order for Nagios to continue running checks against child
services/hosts.


> Has anyone a similar network layout or knows the solution for this?
> 
> Thanks in advance,
> Stefan

At this point, I would have to guess that my earlier theory about
incorrect parents is causing your issue. From the config snippets (which
I've removed from the thread), it appears as though your Nagios box is
somewhere on the backbone. I doubt this is the case, but please correct
me if I'm wrong here.

The documentation here,
http://nagios.sourceforge.net/docs/1_0/networkreachability.html, is
quite old, I believe that it has not been updated since it was written
for NetSaint (other than to change NetSaint references to Nagios).
However, it gives an explanation of parent/child relationships. It is
important to understand that parent/child relationships have nothing to
do with how your network looks to an Admin, but more specifically how
the hosts relate to the location of Nagios. For instance, looking at
your image from the original mail, if client-vip1 is the Nagios host,
then the logical client would be the parent of client-eth0 and
client-eth1. These two clients would be the parents of switch3 and
switch4 respectively. Switches 3 & 4 would BOTH be parents of switches 1
AND 2. Finally, switches 1 & 2 would be parents of routers 1 & 2
respectively. 

There is most likely more documentation to explain this, perhaps even a
FAQ entry on the nagios.org site. These docs may help:
http://nagios.sourceforge.net/docs/1_0/networkoutages.html  
Hopefully I've been able to shed some light on parent/child
relationships if this is your issue.

I also noticed that you included some dependency definitions from your
configuration files. I don't use many dependencies, so I can't offer
much help there, but you should be aware of the fact that service/host
dependencies are NOT related to parent/child relationships in the way
you might think. The docs at
http://nagios.sourceforge.net/docs/1_0/dependencies.html go into vivid
detail about host/service dependencies. There is also a FAQ entry at
http://www.nagios.org/faqs/viewfaq.php?faq_id=145 that explains away
some confusion between dependencies and parent/child relationships.
If your Nagios host really is somewhere on the Backbone, and therefore
your parent/child relationships are correct, then dependencies may be
the reason that things are not working as you would expect.

I've guessed at some of this, because I did not clearly understand the
nature of your problem or where in the network the Nagios host lives. I
hope this information helps you, but if you find that my theory is way
off base, please provide additional information to the list, including
the location of the Nagios host.

Garry W. Cook, CCNA
Network Infrastructure Manager
MACTEC, Inc. - http://www.mactec.com/
303.308.6228 (Office) - 720.220.1862 (Mobile)


-------------------------------------------------------
This SF.Net email is sponsored by OSTG. Have you noticed the changes on
Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
one more big change to announce. We are now OSTG- Open Source Technology
Group. Come see the changes on the new OSTG site. www.ostg.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list