<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META http-equiv=Content-Type content="text/html; charset=iso-8859-1"> <META content="MSHTML 6.00.2900.2802" name=GENERATOR></HEAD> <BODY> <DIV dir=ltr align=left><FONT face=Arial size=2> <BLOCKQUOTE style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #000000 2px solid; MARGIN-RIGHT: 0px"> <DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left> <HR tabIndex=-1> <FONT face=Tahoma size=2><B>From:</B> nagios-users-admin@lists.sourceforge.net [mailto:nagios-users-admin@lists.sourceforge.net] <B>On Behalf Of </B>Christian Lyra<BR><B>Sent:</B> Friday, April 07, 2006 5:11 PM<BR><B>To:</B> nagios-users@lists.sourceforge.net<BR><B>Subject:</B> [Nagios-users] strange behavior with multiple failing hosts and nagios 1.3 / 2.1<BR></FONT><BR></DIV> <DIV></DIV>Hi there,<BR><BR>I was evaluating nagios and found a strange behavior on my test setup. After a fresh install, I did a minimal setup, just one contactgroup with one contact. A hostgroup with 4 hosts (no parent relationship). Since I´m only interested to know if a host is up or down I just configured a check_ping service for each host. As I said, a pretty simple setup. The services is schedulled to run every minute with a one try only. <BR><BR>To simulate a network problem, I just did a "iptables -A INPUT -p icmp -j DROP". I was expecting that I would see all hosts/services down within a minute, as nagios use to "spread" the checks within the one minute (default configuration). To my suprise I saw just one host coming down on one minute, with the subsequent hosts coming down each minute after that. I mean, host 1 comes down on, say, 8:40:13, host 2 on 8:41:05, host 3 on 8:42:05 and host 5 on host 8:43:05. I saw the last host come down almost 4 minutes after the "network problem". <BR><BR>My first try was with nagios 1.3, but the I could reproduce the same problem with nagios 2.1. When I asked a friend to do the same test, he got the same results. A little worst, since he does not check the hosts/services every minute, so he got a host down per 3 minutes, after 10 minutes he couldnt see all the hosts down. <BR><BR>To my surprise, all the hosts come up about the same time after removing the iptables rule. I could not find a explanation for this behavior, and couldnt find anything wrong with the configuration. I´m not sure if this is a feature, or if I hit a bug. A serious bug to be true. <BR><BR>I did a minimal search on the mailing list archives and forums, so excuse me if this is know issue, and plz point me where I can find more about it.<BR><BR><BR>Christian Lyra</BLOCKQUOTE></FONT></DIV> <DIV dir=ltr align=left><FONT face=Arial size=2><SPAN class=085191916-10042006>This is unfortunately a long-standing deficiency in Nagios. It suspends all parallel checking while it performs the host check. The more downed hosts you have, the farther behind it falls on the rest of your service checks.</SPAN></FONT></DIV> <DIV><FONT face=Arial size=2></FONT> </DIV> <P><FONT size=2>--<BR>Ludwig Pummer<BR>System Administrator<BR>Copart Auto Auctions<BR><BR></FONT></P> <DIV><FONT face=Arial size=2></FONT> </DIV></BODY></HTML>