Problem with "retry check interval"
Andreas Ericsson
ae at op5.se
Mon Sep 6 14:36:40 CEST 2004
Alexander Schaefer wrote:
> Hello,
>
> i am using Nagios 1.2 for monitoring in my network. I have about 90 hosts and
> 130 services in the nagios configuration. The problem, i am confrontatet now is
> following. Per default each host will be checkes for availibility with
> check_host_alive plugin. The check occures each 2 minutes and if the state of
> this service is changed from OK to CRITICAL or WARNING, then "retry check
> interval" for this service should check it 3 times with interval of 1 min.
> befor sending notifications. That is defined ( at the host group level ) for
> all hosts in my monitored environment:
>
> define service {
> use generic-service
> hostgroup_name firewalls,mail-server,routers,switchs,win-server,WLAN
> service_description PING
> contact_groups nagios
> check_period 24x7
> notification_interval 120
> notification_options w,u,c,r
> notification_period 24x7
> check_command check-host-alive
> max_check_attempts 3
> normal_check_interval 2
> retry_check_interval 1
> # comment: Check hosts availability
> }
>
> My problem-host X1_voicegate is defined in the group routers:
> define hostgroup {
> hostgroup_name routers
> alias routers
> contact_groups router-admins
> members X1_ras,X1_voicegate,rou2,rou3,rou-internet,rou-internet2 #
> comment Router host-group
> }
>
>
> But i can not understand why, the particular host X1_voicegate "retry check
> interval" works with check delay of only 3 seconds??? This information i can
> see in Nagios event log:
>
> [09-06-2004 13:46:12] HOST ALERT: X1_voicegate;DOWN;HARD;3;PING CRITICAL -
> Host Unreachable (10.4.11.13)
> [09-06-2004 13:46:09] HOST ALERT: X1_voicegate;DOWN;SOFT;2;PING CRITICAL - Host
> Unreachable (10.4.11.13)
> [09-06-2004 13:46:06] HOST ALERT: X1_voicegate;DOWN;SOFT;1;PING CRITICAL - Host
> Unreachable (10.4.11.13)
>
>
> It is a bug in nagios or there is another invisible configuration posibilities
> in nagios?
>
Nagios does host checks in a serialized manner, to prevent service
checks from running when their targeted at an unreachable host. That
means host check 2 will execute as soon as host check 1 is complete, and
that explains why you're seeing the logentries above. It's not a bug,
it's a feature. Nagios is much less pecky about services, so for those
the retry check interval will work smoothly, but the logic implies that
the host needs to be up for services to be checked, so it can't put them
off and risk having service checks being executed in the meantime.
> Thanks for you ideas
>
You're welcome.
> Alex
>
--
Andreas Ericsson andreas.ericsson at op5.se
OP5 AB www.op5.se
Lead Developer
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list