Using two nagios servers...
Chris Beattie
cbeattie at geninfo.com
Fri Oct 15 21:23:00 CEST 2010
Wow, I completely forgot that I’d responded to this. This is what I do. If you use this script, you’ll want to change the notification e-mail address, where it will send notifications when the failover server decides it needs to take over and when it decides to yield to the primary if the primary has come back online.
-------------------------------------------------
Failover Configuration
On the failover server, install the same OS the same way it's installed on the
primary monitoring server, but set Nagios to not start in runlevels 3 and 5 or
else the failover checking script will generate e-mail notifications when the
failover server is rebooted (Nagios will start before the failover server
notices it's running on the primary, and the message will come when the fail-
over server shuts the failover Nagios down).
On the failover server, generate a public/private key pair. This is necessary in
order to avoid having to type in a password every time the state of the Nagios
process on the primary server is checked:
# ssh-keygen -t rsa
Take the default name and location (/root/.ssh/id_rsa and id_rsa.pub). Do not enter
a passphrase.
Copy id_rsa.pub to the primary server:
# rsync -avzu id_rsa.pub primaryserverhostname:/root/.ssh/
On the primary server, append the id_rsa.pub to authorized_keys2:
cat id_rsa.pub >> $HOME/.ssh/authorized_keys2
chmod 0600 authorized_keys2
Download, compile, and install Nagios on the failover server the same way it's
installed on the primary server.
Create a script named nagios_check.sh in /root/:
-----------------------
#!/bin/bash
nagiospath='/usr/local/nagios'
alertaddress='you at yourdomain'
maxfaillimit='3'
touch failed_nagios_checks
failedchecks=$(cat failed_nagios_checks)
if [[ -z "${1}" ]]
then
echo Usage: nagios_check hostname
exit
fi
nagiosstatusnow=$(${nagiospath}/libexec/check_by_ssh -H ${1} --command='/usr/local/nagios/libexec/check_nagios --filename=/usr/local/nagios/var/status.dat --expires=1 --command=nagios')
nagiosstatus="${nagiosstatusnow%%:*}"
nagiosrunninglocally=$(/etc/init.d/nagios status)
if [[ "${nagiosstatus}" = "NAGIOS OK" ]]
then
echo -ne "[`date`] ${nagiosstatus} on ${1}. "
if [[ "${nagiosrunninglocally%% *}" = "nagios" ]]
then
echo -e Nagios is currently running on the failover server, and needs to be stopped.
/etc/init.d/nagios stop
/usr/bin/printf "%b" "[`date`] Nagios recovery on ${1} detected. Stopping failover Nagios.\n\n${nagiosstatusnow}" | /bin/mail -s "Nagios recovery on ${1}" ${alertaddress}
fi
echo -e "Failed ${failedchecks} checks: synchronizing files. Status: ${nagiosstatusnow} "
echo 0 > failed_nagios_checks
rsync --quiet --archive --compress --delete-during --exclude=var/spool/checkresults/* --exclude=var/archives/* --exclude=*~ --exclude=nagios.lock --exclude=nagios.cmd ${1}:${nagiospath} /usr/local
else
failedchecks=$((${failedchecks} + 1))
echo ${failedchecks} > failed_nagios_checks
if [[ "${failedchecks}" -lt "${maxfaillimit}" ]]
then
echo -e "[`date`] Uh-oh! Failed ${failedchecks} out of ${maxfaillimit} checks. Status: ${nagiosstatusnow} "
fi
if [[ "${failedchecks}" -ge "${maxfaillimit}" ]]
then
echo -ne "[`date`] ${nagiosstatus} on ${1}. "
if [[ "${nagiosrunninglocally%% *}" = "No" ]]
then
echo -e " Failed ${failedchecks} checks, and needs to be started on the failover server. "
/etc/init.d/nagios start
/usr/bin/printf "%b" "[`date`] Nagios on ${1} has failed ${failedchecks} checks. Starting Nagios on failover server.\n\n${nagiosstatusnow}" | /bin/mail -s "Nagios failure on ${1}" ${alertaddress}
else
echo -e "Failed ${failedchecks} checks, but is already running on the failover server. "
fi
fi
fi
-----------------------
Make it executable by root:
chmod u+x nagios_check.sh
Run crontab -e as root and add this line:
* * * * * /root/nagios_check.sh primaryserverhostname >> /var/log/nagios_check.log 2>&1
The *s set it to run every minute. The output is redirected to a log file, and the 2>&1 redirects both STDOUT and STDERR.
At the top of the every minute now, the failover server will obtain a current replica of the
primary server's Nagios status (comments, acknowlegements, downtime, configuration files, etc).
Add a file in /etc/logrotate.d called nagios_check:
-----------------------
/var/log/nagios_check.log {
weekly
missingok
notifempty
}
-----------------------
From: quanta [mailto:quanta.linux at gmail.com]
Sent: Wednesday, October 13, 2010 7:17 AM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Using two nagios servers...
Try something like this:
#!/bin/sh
RETURN_STATUS=`/usr/local/nagios/libexec/check_nrpe -H <primary_host> -c check_nagios | awk -F: '{ print $1 }' | awk '{ print $2 }'`
if [ $RETURN_STATUS != "OK" ]; then
sed -i 's/enable_notifications=0/enable_notifications=1/' /usr/local/nagios/etc/nagios.cfg
sed -i 's/execute_service_checks=0/execute_service_checks=1/' /usr/local/nagios/etc/nagios.cfg
else
sed -i 's/enable_notifications=1/enable_notifications=0/' /usr/local/nagios/etc/nagios.cfg
sed -i 's/execute_service_checks=1/execute_service_checks=0/' /usr/local/nagios/etc/nagios.cfg
fi
sudo /etc/init.d/nagios reload
Note: you must add nagios user to sudoers group (without password prompt).
On 08/16/2010 02:44 PM, ravishankar.gundlapali at wipro.com wrote:
Hi,
Even I run Nagios on Virtual machines.
Please let me know where can I get the support for running cron job on my secondary Nagios server to monitor the Nagios service on primary Nagios server?
Thanks,
Ravi G
From: Chris Beattie [mailto:cbeattie at geninfo.com]
Sent: Monday, August 16, 2010 6:51 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Using two nagios servers...
Your servers will probably be fine servicing the extra Nagios polling, unless they are overloaded already.
Since I run Nagios on virtual machines, however, I tried to keep the load on my failover Nagios server minimized. My failover Nagios server runs a cron job that uses the check_nagios plugin to monitor the state of the primary Nagios server. If the primary server is up and running, the failover server will just rsync the state and configuration files from the primary. If the primary server becomes unavailable, the cron job will start the Nagios service on the failover server and keep it running until it detects the primary has recovered.
From: ravishankar.gundlapali at wipro.com [mailto:ravishankar.gundlapali at wipro.com]
Sent: Monday, August 16, 2010 7:45 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Using two nagios servers...
Hi All,
I am planning to configure all the servers in my client environment in two Nagios servers(in two different locations) in order to create Back up.
Please let me know whether there will be any overload on the servers as two Nagios servers will be polling them.
Thanks,
Ravi G
------------------------------------------------------------------------------
This SF.net email is sponsored by
Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20101015/e1c7981f/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list