Host down, still doing active checks, causing multiple unwanted service failures
Toussaint OTTAVI
t.ottavi at medi.fr
Tue Dec 9 15:17:22 CET 2008
Toussaint OTTAVI a écrit:
>
> Following this idea, I will investigate the following :
> - Hosts associated themselves with parent/child relationship according
> to WAN topology (already working)
> - For each host, I will create a "parent" service with only a
> check_alive command
> - Every other service will be a child of this parent service
Answer to myself... After some investigations and doc readings :-) it
seems I made a little confusion between "parent/child" and "dependency" :
- Parent/Child relationship is for hosts only, and should map network
topology. When a host is DOWN, all the children are set to UNREACHABLE.
But this parent/child relationship does not exist for services.
- Dependency can be either for hosts or services. When a dependant
object is down, the "depended upon" object is not checked. But no
assumption is made to the "depended upon" object status. Thus, it is not
set to UNREACHABLE or UNKNOWN, such as for parent/child relationship.
Here's the actual situation :
- Creating a dependancy solves my problem of not checking services when
hosts are unreachable due to WAN failure. This is a smarter solution
than my previous attempt using event_handlers and DISABLE_ALL_SVC_CHECKS
external command. Using wildcards, I just have to declare one dependency
for all services on several hosts like this :
define servicedependency{
host_name Remote_WAN_Router
service_description Remote WAN router ping test
dependent_host_name REMOTE_HOST1, REMOTE_HOST2, ...,
REMOTE_HOSTn
dependent_service_description *
inherits_parent 1
execution_failure_criteria w,u,c
}
- Doing that, when the WAN fails, the checks are not executed, and they
keep their previous status. That's a good thing. But I would have
prefered they get the status UNKNOWN or UNREACHABLE. In fact, I would
like to have the same parent/child behavior that exists for hosts, but
for services.
- I'm not sure it will solve the "latency" problem : if a service check
attempt on remote_host occurs before the remote_wan_router is declared
DOWN and the dependency does its job, then I'll still get critical
failures for those services. The console will display a mix of FAILED
services (those executed before the WAN router check) and some OK
services (Previous state of services that will not be checked due to
dependency). This display would be completely wrong !
Again, in such a situation, I think the right display for services whose
status could not be determined should be "UNKNOWN". Same as hosts that
are "UNREACHABLE"
Comments and ideas welcome.
Kind regards,
--
*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
***Mail:* t.ottavi at medi.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081209/e8f990be/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list