<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><FONT face=Arial size=2>
<BLOCKQUOTE
style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> nagios-users-admin@lists.sourceforge.net
[mailto:nagios-users-admin@lists.sourceforge.net] <B>On Behalf Of
</B>Christian Lyra<BR><B>Sent:</B> Friday, April 07, 2006 5:11
PM<BR><B>To:</B> nagios-users@lists.sourceforge.net<BR><B>Subject:</B>
[Nagios-users] strange behavior with multiple failing hosts and nagios 1.3 /
2.1<BR></FONT><BR></DIV>
<DIV></DIV>Hi there,<BR><BR>I was evaluating nagios and found a strange
behavior on my test setup. After a fresh install, I did a minimal setup, just
one contactgroup with one contact. A hostgroup with 4 hosts (no parent
relationship). Since I´m only interested to know if a host is up or down
I just configured a check_ping service for each host. As I said, a pretty
simple setup. The services is schedulled to run every minute with a one try
only. <BR><BR>To simulate a network problem, I just did a "iptables -A INPUT
-p icmp -j DROP". I was expecting that I would see all hosts/services down
within a minute, as nagios use to "spread" the checks within the one minute
(default configuration). To my suprise I saw just one host coming down on one
minute, with the subsequent hosts coming down each minute after that. I mean,
host 1 comes down on, say, 8:40:13, host 2 on 8:41:05, host 3 on 8:42:05 and
host 5 on host 8:43:05. I saw the last host come down almost 4 minutes
after the "network problem". <BR><BR>My first try was with nagios 1.3, but the
I could reproduce the same problem with nagios 2.1. When I asked a friend to
do the same test, he got the same results. A little worst, since he does not
check the hosts/services every minute, so he got a host down per 3 minutes,
after 10 minutes he couldnt see all the hosts down. <BR><BR>To my surprise,
all the hosts come up about the same time after removing the iptables rule. I
could not find a explanation for this behavior, and couldnt find anything
wrong with the configuration. I´m not sure if this is a feature, or if I hit a
bug. A serious bug to be true. <BR><BR>I did a minimal search on the mailing
list archives and forums, so excuse me if this is know issue, and plz point me
where I can find more about it.<BR><BR><BR>Christian
Lyra</BLOCKQUOTE></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Arial size=2><SPAN
class=085191916-10042006>This is unfortunately a
long-standing deficiency in Nagios. It suspends all parallel checking
while it performs the host check. The more downed hosts you have, the farther
behind it falls on the rest of your service checks.</SPAN></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV><!-- Converted from text/plain format -->
<P><FONT size=2>--<BR>Ludwig Pummer<BR>System Administrator<BR>Copart Auto
Auctions<BR><BR></FONT></P>
<DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>