controlling notifications a bit better
Mike Emigh
maemigh at gmail.com
Wed Aug 9 18:39:09 CEST 2006
On 8/9/06, Andrew Laden <Andrew.Laden at tudor.com> wrote:
>
> One thing to watch is that HOST alerts will get sent out as soon as the
> host is detected down. You can play with the retry settings. But you
> generally need to keep those short, as a host check supercedes all other
> checks, and nagios will essentially pause until it determines status of the
> host.
>
> You can also play with escalations to delay checks. Have no notifications
> initially, and then use an escalation to send the alert later. This takes a
> little work to get right.
>
>
> ------------------------------
> *From:* nagios-users-bounces at lists.sourceforge.net [mailto:
> nagios-users-bounces at lists.sourceforge.net] *On Behalf Of *Aaron Segura
> *Sent:* Wednesday, August 09, 2006 12:09 PM
>
> *To:* nagios-users at lists.sourceforge.net
> *Subject:* Re: [Nagios-users] controlling notifications a bit better
>
> Normal check interval: 5 min
>
> Retry Check interval : 5 min
>
> Max check attempts : 2
>
>
>
> -or-
>
>
>
> Normal check interval: 2 min
>
> Retry check interval: 1 min
>
> Max check attempts: 9
>
>
>
> -or-
>
>
>
> (This is the one I run on some services)
>
> Normal check interval: 5 min
>
> Retry check interval : 1 min
>
> Max check attempts: 6
>
>
>
>
>
> Something along those lines should do it…Yay for math!
>
>
> ------------------------------
>
> *From:* nagios-users-bounces at lists.sourceforge.net [mailto:
> nagios-users-bounces at lists.sourceforge.net] *On Behalf Of *Gavin Cato
> *Sent:* Wednesday, August 09, 2006 12:47 AM
> *To:* nagios-users at lists.sourceforge.net
> *Subject:* [Nagios-users] controlling notifications a bit better
>
>
>
> Hi,
>
>
>
> I want certain hosts/services to only send an email alert if the
> host/service is down for 10 minutes.
>
>
>
> I've tried playing with max_check_attempts and the other obvious
> parameters but I still get email alerts after only 1-2mins.
>
>
>
> Can anyone please show me a sample config snippet or how they do it?
>
>
>
> Cheers
>
>
>
> Gav
>
>
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job
> easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>
As Andrew stated above, it is a bad idea to set max attempts to high numbers
because the host checks are not run in parallel. Doing so will cause your
Nagios to have scheduling problems. Instead let Nagios try to send out the
notification immediately and set up a script to intercept and trash this
first notification. Then reschedule the next notification for 10 minutes
down the road.
Here is how:
# 'host-notify-by-email' command definition
define command{
command_name host-notify-by-email
command_line $USER1$/eventhandlers/check_notification
$NOTIFICATIONNUMBER$ $NOTIFICATIONTYPE$ '/usr/bin/printf "%b" "$HOSTSTATE$ -
$HOSTALIAS$\nDuration: $HOSTDURATION$\nDate: $LONGDATETIME$\nHost:
$HOSTNAME$\nAddress: $HOSTADDRESS$ $NOTIFICATIONNUMBER$" | /usr/bin/mailx -s
"$NOTIFICATIONTYPE$:$HOSTALIAS$/$HOSTSTATE$" $CONTACTEMAIL$'
}
Notice how I added the notification number macro in the above command.
Now, create the check_notification script that it calls:
#!/bin/sh
if [ "$1" = 1 ] ; then
if [ "$2" = PROBLEM ] ; then
exit 0
fi
elif [ "$1" = 2 ] ; then
if [ "$2" = RECOVERY ] ; then
exit 0
fi
fi
sh -c "$3"
What the above does it basically throws away the first notification (which
occurs immediately after a host goes down). The setup might seem a little
strange, but this method allows you to keep your notification message
options inside the Nagios config file.
Now because the first notification is thrown away, we need to have it
schedule another notification for 10 minutes later:
Do this by adding an event_handler to the host definition:
event_handler ignore_first_hostpage
Define this eventhandler:
define command{
command_name ignore_first_hostpage
command_line $USER1$/eventhandlers/host_notification $HOSTSTATE$
$HOSTSTATETYPE$ $HOSTNAME$
}
Now create the host_notification script which is called in the above
command:
#!/bin/sh
# This is a sample shell script showing how you can submit the
DELAY_HOST_NOTIFICATION command
# to Nagios. Adjust variables to fit your environment as necessary.
# Only take action on hard host states...
case "$2" in
HARD)
case "$1" in
DOWN)
# The host has gone down!
now=`/usr/bin/perl -e 'printf "%d\n", time;'`
newpagetime=`expr $now + 600`
commandfile='/opt/FONnagios/var/rw/nagios.cmd'
commandline="[$now] DELAY_HOST_NOTIFICATION;$3;$newpagetime"
commandline2="[$now] SCHEDULE_HOST_CHECK;$3;$newpagetime"
echo $commandline >> $commandfile
echo $commandline2 >> $commandfile
;;
esac
;;
esac
exit 0
In the above script, it is important to have the host check scheduled after
the delay notification command because the notification will not occur until
after the next check fails. If the next check does not fail, and the host
recovers, you will receive no notifications.
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060809/7ee56296/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list