Guys<br><br>I'm trying the latest stable 2.x version (2.9) and on top of the 2 already existing default host templates I added a 3rd one since the documentation states that there is no limit.<br><br>I added a host and started monitoring. When I took it down it took between 2 - 5 mins for the host down notification to come in.
<br>However, later on I rebooted again and this time nothing came in. The nagios log showed nothing about wanting to send a notification either. The box came back without any<br>notification.<br>I took it down again later and waited - after 50 minutes I got a host down notification. When I brought the host back I almost immediately got a host up notification.
<br><br>I removed one of the the templates to change the recursion level of the host templates from 3 to 2 and tried again. I did 3 tests and all came back fine this time. I always got the notification<br>within 5 minutes max.
<br>Then I added the 3rd template back again to see whether it had to do with that but now I can't reproduce this. I did 2 tests and both were fine.<br><br>I don't feel that I can trust nagios now though. I've been using it for a few years now since version
1.2 and I've never seen this behaviour before.<br>However, I've also never used more than 1 host/service template. This time I wanted to make more use of the object inheritance logic to shorten my cfg but somehow I feel it causes problems.
<br>How deep is the template recursion for most of you folks ?<br><br>Here are the templates I was using when the 50 min delay happened<br><br>Hosts :<br><br># Host templates<br><br>define host{<br> name generic-host
<br> notifications_enabled 1<br> event_handler_enabled 1<br> flap_detection_enabled 1<br> failure_prediction_enabled 1<br> process_perf_data 1
<br> retain_status_information 1<br> retain_nonstatus_information 1<br> notification_period 24x7<br> register 0<br> }<br><br>define host{<br> name generic-linux
<br> use generic-host<br> check_period 24x7<br> max_check_attempts 10<br> check_command check-host-alive<br> notification_interval 120
<br> notification_options d,u,r<br> register 0<br> }<br><br>define host{<br> name prod<br> use generic-linux
<br> contact_groups sysadmins,psst<br> register 0<br> }<br><br>define host{<br> name nonprod<br> use generic-linux
<br> contact_groups sysadmins<br> register 0<br> }<br><br>Then I use either the prod or nonprod template for all my hosts.<br><br>same with services :<br><br># Service templates
<br><br>define service{<br> name generic-service<br> active_checks_enabled 1<br> passive_checks_enabled 1<br> parallelize_check 1<br>
obsess_over_service 1<br> check_freshness 0<br> notifications_enabled 1<br> event_handler_enabled 1<br> flap_detection_enabled 1
<br> failure_prediction_enabled 1<br> process_perf_data 1<br> retain_status_information 1<br> retain_nonstatus_information 1<br> is_volatile 0
<br> register 0<br> }<br><br>define service{<br> name generic-checks<br> use generic-service<br> check_period  
; 24x7
<br> max_check_attempts 4<br> normal_check_interval 5<br> retry_check_interval 1<br> notification_options w,u,c,r<br> notification_interval 60
<br> notification_period 24x7<br> register 0<br> }<br><br><br>define service{<br> name prod<br> use generic-checks
<br> contact_groups sysadmins,psst<br> register 0<br> }<br><br>define service{<br> name nonprod<br> use generic-checks
<br> contact_groups sysadmins<br> register 0<br> }<br><br>Here I also use prod or nonprod as templates for my services. <br><br>I'm gonna test the more tomorrrow but I'm worried that if a host goes down I might not get notified again until 50 mins later or maybe never who knows ?
<br>It doesn't seem to behave the same way every time but as far as I see it the service checks are every 5 minutes so within that time frame I should get a notification.<br>Parallel checks is turned on as well.<br><br>
Has anyone seen similar delays ?<br clear="all"><br>-- <br>stucky