Alternate check interval when state become CRITICAL
Justin Pasher
justinp at newmediagateway.com
Tue Feb 10 22:23:24 CET 2009
Thomas Guyot-Sionnest wrote:
>> What I would like to do is have the check interval change to every one
>> minute when the state become CRITICAL, but keep the notifications at 5
>> minute intervals.
>>
>
> It's simple: use an eventhandler.
>
> You can look at this for inspiration, although you would likely need
> some more details to understand what I'm trying to do there...
> http://solaris.beaubien.net/~dermoth/media/nagios/handle_stall_counter
>
Alrighty. I took the script above as the base and tweaked it to my
setup. The theory behind the code is working, but there is still one
caveat. When the service goes into a HARD CRITICAL state, the event
handler is called and it correctly sends the command to Nagios to update
the check interval. The problem is that when the command is sent to
Nagios, Nagios has already set the next scheduled check (which defaults
to five minutes out). This means the next service check still won't
happen for another five minutes. After the next check occurs, if the
service is still in a HARD CRITICAL state, the NEXT scheduled check will
follow the new check interval that was set by the event handler (one
minute). At that time, it will continue to perform checks at one minute
intervals until the service is normal again.
Once the service is back to a normal state, the event handler is called
again, which send the command to Nagios to change the check interval
back to five minutes. However, like before, the next scheduled check has
already been set (one minute out), so the check happens again in one
minute. If the service is still up, it applies the check interval set by
the event handler.
In the latter instance, it's not that big of a deal since it just causes
another check a little sooner than usual. However, in the first
instance, because the next scheduled check is still five minutes out the
first time around, it defeats the whole purpose of having the custom
event handler
Do you know any way around this? I've attached the service info and
event handler for reference.
Justin Pasher
==============================
define service {
host_name myhost
service_description www.myhost.com
check_command check_http2!www.myhost.com!25!50
contact_groups admins
event_handler change_check_interval
use nmg-service
}
define command {
command_name change_check_interval
command_line /etc/nagios3/change_check_interval $HOSTNAME$
$HOSTADDRESS$ $SERVICEDESC$ $SERVICESTATE$ $SERVICESTATETYPE$
$SERVICEATTEMPT$
}
==============================
/etc/nagios3/change_check_interval:
#!/usr/bin/perl
use strict;
use warnings;
# Fork to let Nagios keep on working...
if (fork != 0) {
# Nobody cares if fork failed...
warn("Daemonizing... Thanks for calling me.");
exit(0);
}
die("Usage: $0 <hostname> <hostaddress> <service desc> <state>
<statetype> <stateattempt>") unless (@ARGV == 6);
my $commandfile = '/var/lib/nagios3/rw/nagios.cmd';
my $hostname = $ARGV[0];
my $hostaddress = $ARGV[1];
my $servicedesc = $ARGV[2];
my $state = $ARGV[3];
my $statetype = $ARGV[4];
my $stateattempt = $ARGV[5];
# If state becomes HARD WARNING, change the check interval to something
# smaller so the check eventually gets back to OK.
if ($state eq 'CRITICAL' && $statetype eq 'HARD')
{
open(CMD, ">>$commandfile");
printf(CMD "[%lu] CHANGE_NORMAL_SVC_CHECK_INTERVAL;%s;%s;1\n", time,
$hostname, $servicedesc);
close(CMD);
die("Check interval for $hostname set to 1 minute");
}
# If state becomes HARD OK, revert the check interval to yearly check in
# order to avoid flooding Nagios logs.
if ($state eq 'OK' && $statetype eq 'HARD')
{
open(CMD, ">>$commandfile");
printf(CMD "[%lu] CHANGE_NORMAL_SVC_CHECK_INTERVAL;%s;%s;5\n", time,
$hostname, $servicedesc);
close(CMD);
die("Check interval for $hostname set to 5 minutes");
}
------------------------------------------------------------------------------
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list