Running redundant nagios on shared storage (eg. NFS)

Thomas Sluyter nagios at kilala.nl
Mon Jun 19 09:37:23 CEST 2006

Previous message: Running redundant nagios on shared storage (eg. NFS)
Next message: Turning off windows hosts at temp limit
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi :)

Actually, I'll be building a redundant Nagios setup for my current  
client. Barring the use of real clustering software I'll make it a  
cluster-alike so to speak...

In our setup we'll have three pairs of hosts:
* Internal checkers (through our own networks)
* External checkers (through the Internet)
* Display hosts (running the GUI and so on)

I based my design on the ideas given in "Pro Nagios 2.0" by James  
Turnbull, although I did make some changes to his design.

James' ideas regarding each pair:
* Both nodes run Nagios.
* The primary node performs all checks and sends its status messages  
through NSCA to the secondary.
* The secondary node does not act upon incoming status messages  
(checks, notifcation and actions and such disabled).
* If the primary node fails, the secondary node will automatically  
enable the aforementioned actions.
* When the primary node is fixed, the secondary node will  
automatically disable itself.

Problems:
* The primary does not know the current state of all monitored  
objects when it comes back up again.
* Upon coming back up there will be a certain time window of  
uncertainty regarding the status of various objects.

My ideas regarding each pair:
* Both nodes run Nagios.
* The active node performs all checks and sends its status messages  
through NSCA to the passive.
* The passive node does not act upon incoming status messages  
(checks, notifcation and actions and such disabled).
* If the active node fails, the passive node will automatically  
enable the aforementioned actions.
* When the previously active node is fixed, the prev. passive node  
will continue running like it was. It will also synchronise its  
configuration and status files to the prev. primary. All the status  
messages will be sent through NSCA to the other node.

As an added bonus we will use the passive node (under -normal-  
running conditions) to test configuration changes before applying  
them to the active node. That way we'll prevent ourselves from making  
stupid mistakes :) If the new config is found to be 100% correct it  
will be synched to the active node.

There are still some bugs in the design I need to iron out before I  
can start building it, but I'll get there. Of course, I'd rather do  
this the -right- way using something like Sun Cluster ;_;

Cheers!

Thomas

On 18 Jun, 2006, at 17:35, Filip Sneppe wrote:

> Hi,
>
> I am looking into implementing a redundant monitoring setup using
> Nagios. I have read the Nagios documentation, but the solution(s)
> presented there have a couple of drawbacks, imho: they require
> extra configuration, and/or they do not keep all history information
> consistent accross both systems.
>
> Just this week, I came accross a Nagios installation where two
> copies of the main nagios process were running from the same
> configuration. This setup had been running like this for almost
> two months, and the only apparent problem was that two notifications
> were sent out for every problem/recovery.
>
> So this got me thinking if this setup would work:
>
> - A highly available NFS backend
> - Two Nagios servers with /var/run/nagios mounted on the
>   NFS backend.
> - configuration information (/etc/nagios) is replicated
>   accross both nodes
> - only one nagios process running, and monitoring of both
>   nagios systems. If one fails (either a complete host
>   failure or a failure of the nagios process), the other
>   node starts up nagios and continues from the data
>   in /var/run/nagios
>
> Is this scenario too good to be true ? Are there any quirks
> I am overlooking ? Or is anyone running this setup ?
>
> I'd be really happy if someone could tell me this would actually
> work. Alternatively I'm interested to know what kind of
> redundant monitoring setup people are running that require
> minimal configuration and keep all logging information
> centralized.
>
> Thanks in advance!
>
> Filip

_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Previous message: Running redundant nagios on shared storage (eg. NFS)
Next message: Turning off windows hosts at temp limit
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Users mailing list