Running redundant nagios on shared storage (eg. NFS)
Thomas Sluyter
nagios at kilala.nl
Mon Jun 19 09:37:23 CEST 2006
Hi :)
Actually, I'll be building a redundant Nagios setup for my current
client. Barring the use of real clustering software I'll make it a
cluster-alike so to speak...
In our setup we'll have three pairs of hosts:
* Internal checkers (through our own networks)
* External checkers (through the Internet)
* Display hosts (running the GUI and so on)
I based my design on the ideas given in "Pro Nagios 2.0" by James
Turnbull, although I did make some changes to his design.
James' ideas regarding each pair:
* Both nodes run Nagios.
* The primary node performs all checks and sends its status messages
through NSCA to the secondary.
* The secondary node does not act upon incoming status messages
(checks, notifcation and actions and such disabled).
* If the primary node fails, the secondary node will automatically
enable the aforementioned actions.
* When the primary node is fixed, the secondary node will
automatically disable itself.
Problems:
* The primary does not know the current state of all monitored
objects when it comes back up again.
* Upon coming back up there will be a certain time window of
uncertainty regarding the status of various objects.
My ideas regarding each pair:
* Both nodes run Nagios.
* The active node performs all checks and sends its status messages
through NSCA to the passive.
* The passive node does not act upon incoming status messages
(checks, notifcation and actions and such disabled).
* If the active node fails, the passive node will automatically
enable the aforementioned actions.
* When the previously active node is fixed, the prev. passive node
will continue running like it was. It will also synchronise its
configuration and status files to the prev. primary. All the status
messages will be sent through NSCA to the other node.
As an added bonus we will use the passive node (under -normal-
running conditions) to test configuration changes before applying
them to the active node. That way we'll prevent ourselves from making
stupid mistakes :) If the new config is found to be 100% correct it
will be synched to the active node.
There are still some bugs in the design I need to iron out before I
can start building it, but I'll get there. Of course, I'd rather do
this the -right- way using something like Sun Cluster ;_;
Cheers!
Thomas
On 18 Jun, 2006, at 17:35, Filip Sneppe wrote:
> Hi,
>
> I am looking into implementing a redundant monitoring setup using
> Nagios. I have read the Nagios documentation, but the solution(s)
> presented there have a couple of drawbacks, imho: they require
> extra configuration, and/or they do not keep all history information
> consistent accross both systems.
>
> Just this week, I came accross a Nagios installation where two
> copies of the main nagios process were running from the same
> configuration. This setup had been running like this for almost
> two months, and the only apparent problem was that two notifications
> were sent out for every problem/recovery.
>
> So this got me thinking if this setup would work:
>
> - A highly available NFS backend
> - Two Nagios servers with /var/run/nagios mounted on the
> NFS backend.
> - configuration information (/etc/nagios) is replicated
> accross both nodes
> - only one nagios process running, and monitoring of both
> nagios systems. If one fails (either a complete host
> failure or a failure of the nagios process), the other
> node starts up nagios and continues from the data
> in /var/run/nagios
>
> Is this scenario too good to be true ? Are there any quirks
> I am overlooking ? Or is anyone running this setup ?
>
> I'd be really happy if someone could tell me this would actually
> work. Alternatively I'm interested to know what kind of
> redundant monitoring setup people are running that require
> minimal configuration and keep all logging information
> centralized.
>
> Thanks in advance!
>
> Filip
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list