Host down, still doing active checks, causing multiple unwanted service failures
Toussaint OTTAVI
t.ottavi at medi.fr
Fri Dec 12 16:43:43 CET 2008
Hi,
Marc Powell a écrit:
> Our ideas of accuracy would seem to differ ;)
>
Sometimes, in life, it's necessary to be able to say : "I don't know".
When a host is simply powered off, or unreachable due to network/wan
failure, Nagios actually displays all the service checks with the
results depending on how the plugin is written, and also depending on
the exact time when the latest service check has occurred. Some results
may be UNKNOWN, some other may be CRITICAL, and the others would be OK
(if dependancy is used).
This really bothers me, I do think this is inaccurate. In such a
situation, I would expect all the services to be in "UNKNOWN" state.
>> We do not use email notifications, because we are only 2 guys, and
>> this would generate too much messages.
>>
>
> It shouldn't. In your scenario of 1 host down with X number of
> services on it, you should only receive 1 down message and 1 recovery
> message per host event (unless you want more).
>
Nagios is smart enough, and notifications are very tunable, to avoid
email notification floods. But other products, such as routers,
firewalls or security software, are not. They used to fill our mailboxes
with unuseful things. That's the reason why I don't like email
notifications, at least for general purpose problems. I use them only
for very critical events.
Moreover, parent/child system has been design exactly to handle the
situation where a host is unreachable. This system allows to disable
notifications for all services, which would necessary fail or return
wrong results if host is unreachable. I would like to be able to use
this system also do disable "incorrect" service status display, and,
when a host is unreachable, having the display saying "UNKNOWN" for all
services (such as hosts are displayed as UNREACHABLE).
This is the way I would like to see my results. This may not be the way
other users would want to see them. But not two users are the same, have
the same configuration, ot the same needs. I just would like to find a
solution, allowing to display my results in a way that would be the most
usable and valuable for me.
> Possibly but with an additional requirement that regularly scheduled
> host checks are enabled for those hosts. Those are still considered
> optional and have been undesirable for all prior versions of nagios
> before current. If someone were to code the patch they would need to
> ensure they were enabled for the hosts with this new feature enabled
> otherwise the host would never be checked and return out of it's
> critical state.
>
I agree with you. Checks should be for services, and hosts should only
be "containers" for services. Having to enable checks also for the hosts
is a little bit confusing for beginners. I also consider host checks as
"undesirable".
But, if I understand well, host checks are here to determine
parent/child reachability, which then allows to determine UNREACHABLE
status, then disable unuseful service failure notifications. Then, why
not creating parent/child relationship between services ? This would
remove the need of host checks, and this would allow services to be
displayed as UNREACHABLE or UNKNOWN, if their parent service check fails.
Dependancy already exists for both hosts and services. Why not
parent/child/unreachable relationship ?
Of course, this is only a feature suggestion, everybody should be free
to use it or not. But I'll be happy to use it ;-)
> This is promising. http://nagios.sourceforge.net/docs/3_0/objecttricks.html#same_host_dependency
> will help with the config if you haven't seen it.
>
It works fine. Ability to use wildcards is a great feature. Services now
don't fail when a host is unreachable, but some problems (for me) remain :
- all services keep their previous status, which is usually OK. As
previously said, in such a situation, I would prefer UNKNOWN
- "latency" problem : some service checks are sometimes scheduled AFTER
the WAN failure, but BEFORE the dependancy service check. Then, they
fail.Using "soft dependancy" and scheduling the dependancy service check
more often, helps to reduce this situation. But it still happens from
times to times.
>> Am I the only one having this problem ?
>>
>
> I don't consider it a problem myself, just that nagios doesn't work as
> you want it to in your environment. I personally prefer the current
> behavior since it provides more accurate information over a wider
> variety of outage scenarios.
>
Let's be clear. Nagios has no problems, it behaves exactly as it is
intended to. The one who as a problem is ME. I need to present the
results in a different way in case of unreachable host, and I'm looking
for a solution to do that.
I just would like to know if I am the only guy thinking results of
service checks for unreachable hosts should be displayable differently ?
KInd regards,
--
*Toussaint OTTAVI*
*MEDI INFORMATIQUE*
***Mail:* t.ottavi at medi.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081212/80252f22/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list