host-down notification can take 50 mins to be sent
stucky
stucky101 at gmail.com
Fri Jun 15 10:37:11 CEST 2007
Guys
I'm trying the latest stable 2.x version (2.9) and on top of the 2 already
existing default host templates I added a 3rd one since the documentation
states that there is no limit.
I added a host and started monitoring. When I took it down it took between 2
- 5 mins for the host down notification to come in.
However, later on I rebooted again and this time nothing came in. The nagios
log showed nothing about wanting to send a notification either. The box came
back without any
notification.
I took it down again later and waited - after 50 minutes I got a host down
notification. When I brought the host back I almost immediately got a host
up notification.
I removed one of the the templates to change the recursion level of the host
templates from 3 to 2 and tried again. I did 3 tests and all came back fine
this time. I always got the notification
within 5 minutes max.
Then I added the 3rd template back again to see whether it had to do with
that but now I can't reproduce this. I did 2 tests and both were fine.
I don't feel that I can trust nagios now though. I've been using it for a
few years now since version 1.2 and I've never seen this behaviour before.
However, I've also never used more than 1 host/service template. This time I
wanted to make more use of the object inheritance logic to shorten my cfg
but somehow I feel it causes problems.
How deep is the template recursion for most of you folks ?
Here are the templates I was using when the 50 min delay happened
Hosts :
# Host templates
define host{
name generic-host
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_period 24x7
register 0
}
define host{
name generic-linux
use generic-host
check_period 24x7
max_check_attempts 10
check_command check-host-alive
notification_interval 120
notification_options d,u,r
register 0
}
define host{
name prod
use generic-linux
contact_groups sysadmins,psst
register 0
}
define host{
name nonprod
use generic-linux
contact_groups sysadmins
register 0
}
Then I use either the prod or nonprod template for all my hosts.
same with services :
# Service templates
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
register 0
}
define service{
name generic-checks
use generic-service
check_period 24x7
max_check_attempts 4
normal_check_interval 5
retry_check_interval 1
notification_options w,u,c,r
notification_interval 60
notification_period 24x7
register 0
}
define service{
name prod
use generic-checks
contact_groups sysadmins,psst
register 0
}
define service{
name nonprod
use generic-checks
contact_groups sysadmins
register 0
}
Here I also use prod or nonprod as templates for my services.
I'm gonna test the more tomorrrow but I'm worried that if a host goes down I
might not get notified again until 50 mins later or maybe never who knows ?
It doesn't seem to behave the same way every time but as far as I see it the
service checks are every 5 minutes so within that time frame I should get a
notification.
Parallel checks is turned on as well.
Has anyone seen similar delays ?
--
stucky
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070615/4f5c4f46/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list