checks, notifications don't work after time period exception
Seth Simmons
ssimmons at cymfony.com
Mon Aug 25 15:05:09 CEST 2008
We have a qa group overseas that will work on our customer sites during
the US overnight. To avoid false alerts, I added a time exception so
notifications are not sent out between 4am and 5:30am. The problem is,
after the exception, Nagios (3.0.3) won't send notifications, neither
are checks performed for any sites with an exception. If a site is in a
critical state either shortly after 4 or (if they start early) right
before 4, checks do not continue after 5:30. When I look at Nagios
later, it shows it in critical and the last check was done at 3:58am
with the next check at midnight the next day.
Let me give some more specific examples:
Server-A is running abc.customer.com for us and our qa group takes the
site down at 3:55am, before the 4am exception. Nagios will show as
critical until either midnight the next day, or you force a check on the
service. So, say at 8am I look at it, the service is critical with last
check at 3:55am and next scheduled check at 12am tomorrow. When I force
a check, it will continue on normal check schedule and send notice that
the service is ok.
Server-B is also running a site and tomcat is stopped at 4:10am. This
service has notification period with the same time period with
exceptions from 4am - 5:30am. After that it will not send
notifications. At 8am it is still doing checks and saying is critical,
but when looking at the details it says it has not sent any
notifications. When I force a check it still won't do it. If I restart
Nagios then it does a check it will send first notice. I don't see
anything wrong with my time period so not sure where the issue is. Not
sure if anyone else has noticed this before.
Here is what I have for that time period and checks for the above
examples:
define timeperiod{
timeperiod_name url-monitor
alias url-monitor
sunday 00:00-23:59
monday 00:00-23:59
tuesday 00:00-23:59
wednesday 00:00-23:59
thursday 00:00-23:59
friday 00:00-23:59
saturday 00:00-23:59
exclude recycle
}
define timeperiod{
timeperiod_name recycle
alias recycle
sunday 04:00-05:30
monday 04:00-05:30
tuesday 04:00-05:30
wednesday 04:00-05:30
thursday 04:00-05:30
friday 04:00-05:30
saturday 04:00-05:30
}
define command{
command_name check_http_abc
command_line $USER1$/check_http -H abc.company.com
}
define service{
use
generic-service
host_name
Server-A
service_description site abc
is_volatile 0
check_period
url-monitor
max_check_attempts 2
normal_check_interval 5
retry_check_interval 5
contacts
nagiosadmin
notification_interval 30
notification_period url-monitor
notification_options w,c,r
check_command check_http_abc
}
define service{
use
local-service
host_name Server-B
service_description HTTP
check_period 24x7
max_check_attempts 2
normal_check_interval 3
retry_check_interval 5
contacts
nagiosadmin
notification_interval 60
notification_period url-monitor
notification_options w,c,r
check_command check_http
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080825/f782c336/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list