nagios server redundancy
Mike Lindsey
mike-nagios at 5dninja.net
Fri Feb 11 22:35:26 CET 2011
On 2/11/11 10:26 AM, Morty wrote:
> I'm looking to implement redundant nagios servers, with the backup
> server in a different location than the prime server. This is nagios
> 3.2.3, with the default web interface. I'm synchronizing
> configurations by rsyncing /usr/local/nagios/etc/ between systems.
> I'm doing active/active (i.e. I want the backup server monitoring at
> the same time as the prime server.) So far so good.
>
> Problem: acknowledgements on the prime are not being synced to the
> backup.
>
> Is there a (clean) way to sync the prime's acknowledgements to the
> backup, as well? I'm tempted to shut down the backup, rsync the
> prime's var directory to the backup, and then bring the backup back
> online. But the docs have various warnings about not messing with the
> var files, so figured I'd ask about possible hidden gotchas.
>
> I've read http://nagios.sourceforge.net/docs/3_0/redundancy.html, but
> scenario one doesn't discuss syncing acknowledgements, and scenario 2
> is active/passive.
What I end up doing with my backup master is leave it off, with frequent
rsyncs of both config and the status files in var.
Both the active master and the backup master are sitting behind a load
balanced vip, with the nsca and http/https ports managed by the load
balancer. There's a cronjob running on the backup master that, if it
determines an error on the active master, starts up nsca, nagios, and
apache. That causes the vip to fail over to the backup master, giving
automatic recover with no more than five minutes of downtime (the
frequency of the cronjob).
The active master does not have apache, nsca, or nagios configured to
start on boot, instead those are also managed by a cronjob that does a
check of the backup master. If the backup master is running
apache/nagios/nsca, then the active master doesn't start up (and if
they're already running, say from an intermittent error, they shut down)
and the rsyncs also don't happen. This allows me to do automatic
failover, and manual fail-back, after whatever issue triggered the
failover has been verified and resolved.
You cannot - to the best of my knowledge - sync acknowledgments to a
backup server while it's actively running, unless you want to write
something that checks for new acks and dumps them into the command
pipe. So, if you want to maintain acks and downtime, you'll need to
have your backup disabled for the syncs.
--
Mike Lindsey
------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list