Failover Monitoring.
Chris Beattie
cbeattie at geninfo.com
Mon Dec 8 16:35:46 CET 2008
> -----Original Message-----
> From: Eddie [mailto:bigedd at gmail.com]
>
> To keep services status information up-to-date on the slave the master
> sends all service check results to the slave.
I got this set up between my master and slave servers, and then noticed
that comments and other stuff like that weren't being replicated. If
you add hosts and services, or otherwise change your configuration,
those changes won't be replicated either.
> If monitoring fails-over to the slave, how is the status information
> sent to the new slave so that it is then kept updated with new status
> information, or do we need to set this up manually after the failover?
I wrote a script which is run every minute by a cron job on the slave
server. It uses check_by_ssh to run check_nagios on the master. If
Nagios is running on the master, then it checks if Nagios is running on
the slave. If so, it stops the slave Nagios. If Nagios is only running
on the master, it rsyncs any changed files in the whole Nagios directory
(minus the checkresults directory, archives directory, lock file, and
command file). If Nagios is not currently running on either server, it
will start Nagios on the slave server. And, just because I could, I
have it write to a log file and e-mail me if it fails over or back.
It doesn't copy anything from the slave server back to the master
server, because I expect the slave to run only temporarily. If the
master server takes that long to fix, I can copy the status back
manually.
Since my master and slave servers are identically-configured, when I
upgraded the master to Nagios 3.0.6, the slave got upgraded a minute
later.
> Couldn't we nullify the need to keep two separate stores of status
> information data, by keeping it in one place? Is it possible to have
> the status information stored on a third host (say on an NFS) that
> both the master and slave have access to?
I don't have a highly-available file share, so I keep two copies of
Nagios and rsync minimizes the amount of data that has to be copied.. I
plan to move the slave server to another office (it's a virtual
machine). If the link between the offices went down, one Nagios or the
other would lose its connection to its config files. Also, this way
there is a window of time where both instances might be active and
trying to write to the same files.
> Is it possible to have more than one slave?
Yes. If you do it like the docs say, you'll need to write a script to
submit check results to the slave server. It's just one more line to
submit the check results to another slave server. I don't know how to
write the event handler to check more than one Nagios process, though.
If you do it like above, both slave servers can run the cron job, and
they can check each other as well as the primary to see if they need to
start their Nagios process. You might want to have one slave run the
cron job on the even minutes and the other slave run the job on the odd
minutes (or some other alternating pattern) so that they won't both try
to start up at the same time.
> Are there any other resources that go into more detail on failover
> (and redundant monitoring)?
I did a lot of Googling to get it working like in the docs, and then
subscribed to this mailing list five days ago so I could ask about how
to get status information to the slave server. :-)
Nothing in this message is intended to make or accept and offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately
.
Thank you.
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list