<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
</head>
<body bgcolor="#ffffff" text="#000000">
<font size="-1"><font face="Arial">Hi list,<br>
<br>
I've been investigating this problem for a while, but I couldn't find a
good solution.<br>
<br>
* Example situation :<br>
Assume I have one host with 20 service checks.<br>
<br>
* Problem :<br>
If the host becomes DOWN, Nagios still continues to do service checks
on this host. So, after a while, all the services will go to a CRITICAL
state. Then, in my console, I will see : <br>
- 1 Host down, <br>
- 20 Services down<br>
This information is not pertinent. The only information I would see in
such a case is the "host down". The 20 "service down" informations are
obvious, and generate a "visual pollution" that may prevent to easily
identify the problem.<br>
<br>
* Expected behavior :<br>
When a host is down, I would like to :<br>
- See only one thing in red in the console : 1 HOST DOWN<br>
- Disabling all the service checks (which at this point do not have any
chance of success)<br>
- Put the service into "UNKNOWN" status<br>
<br>
Comments:<br>
In Nagios, there are parent/child dependencies. When a host is down,
all the child hosts are not tested, and their status becomes
"UNREACHABLE". Good thing. Same thing for services. But, as far as I
know, there are no dependencies between a host and its services. I
googled/read a lot of things in the docs. This seems to be "by design",
there's no way to declare a service as a child of its (parent) host ! I
didn't really understand the reasons of this choice, but I would like
to work around.<br>
<br>
Then I played around with event handlers. When a host status changes,
the event handler calls a script. The script checks the status of the
"calling" host. If the host is DOWN or UNREACHABLE, it sends back to
Nagios an "external command" to disable all active service checks. If
the status of the host is UP, then it sends the external command to
enable all service checks for that particular host. It works. But there
is some "latency" between the time the services are disabled by the
eventhandler, and the time Nagios stops doing the service checks.
Usually, some services are still checked, and provide unwanted "FAILED"
status. I think this is because these checks were queued before the
handler disabled them, thus they're executed. So I'm not s100%
satisfied.<br>
<br>
The next step would be to use service event handlers to put every
service into "UNKNOWN" status each time a service check is disabled.
But I have two problems :<br>
- In my external script, I can not determine if a service check is
ENABLED of DISABLED. There are a lot of "macros" available, but none of
them gives me this information.<br>
- This may not solve the "latency" problem, if I manually set an
"UNKNOWN" status on a DISABLED service, but an active check is already
in the queue, and its result will arrive later...<br>
<br>
Of course, the ideal situation would be to have a parent/child
dependancy acting between hosts and services...<br>
<br>
Any comments and suggestions are welcome. Thank you in advance for your
help.<br>
<br>
Kind regards<br>
</font></font>
<div class="moz-signature">-- <br>
<title>Signature</title>
<meta http-equiv="Content-Type" content="text/html; ">
<style type="text/css">
<!--
.stylesig {
font-family: Arial, Helvetica, sans-serif;
font-size: 12px;
color: #000066;
}
-->
</style>
<p class="stylesig"><strong>Toussaint OTTAVI</strong><br>
<strong>MEDI INFORMATIQUE</strong><br>
<strong></strong><strong>Mail:</strong> <a class="moz-txt-link-abbreviated" href="mailto:t.ottavi@medi.fr">t.ottavi@medi.fr</a></p>
</div>
</body>
</html>