timeouts from one machine and not another
Andreas Ericsson
ae at op5.se
Wed Jul 7 11:16:48 CEST 2004
David Bishop wrote:
> I have two nagios servers checking basically the same machines (we don't
> need no stinking failover). However, from one machine (A), I get a lot of
> time-out errors on certain machines (it times out when checking smtp and
> ftp) and on the other, I don't. If I try it from the command-line (just
> telnetting to the ports), it hangs for a long time (long being greater
> than 10 seconds) but finally connects. However, connecting to the same
> machine from B, it's instantaneous. Normally I'd suspect the network
> connection between the two machines (client and A), but a reverse
> connection works very quickly (connecting to A's smtp port), and they
> are both on underutilized 1.5Mb lines.
A client of ours had this problem because of an overloaded router that
was supposed to send a NEXT_HOP but couldn't always manage the load.
When the traffic went the other way, the overloaded router was never in
the picture and connections worked beautifully every time (that had us
scratching our heads for quite some time).
Another thing that can cause this is NIC-setting autodetection. Some
not-so-nice switches and OS's try to renegotiate ethernet settings
(duplex and speed). This normally causes the interface on both switch
and server to go dormant until the negotiation is complete.
> Ping time between them (either
> way) averages slightly over 100ms. The only real difference that I can
> think of between A and B is that A is running FreeBSD 5.2.1 and B is
> Debian/Sid. The clients are all also running Debian (if that matters).
>
It shouldn't really. FreeBSD has a legacy of perfectly working IP stack,
so the problem is most likely in the network.
> Help, please :-(
>
Things to check for;
Are the timeouts happening only during certain hours, and in that case
when, and what else is happening during those hours (backups, people at
work, lots of web-/ftp-server hits)?
Things to try;
Switch places and Nagios-configuration on the two machines. If A still
fumbles you know the problem resides on the server. If B fumbles, it's
in the network.
> D.A.Bishop
>
--
Sourcerrer / Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
::: Messages without supporting info will risk being sent to /dev/null
More information about the Users
mailing list