Small patch for check_nrpe.c

Mark Plaksin happy at usg.edu
Thu Aug 31 17:16:11 CEST 2006


Here's a small patch which makes check_nrpe close the socket gracefully
when it's done.  This resolved a problem we were having with spurious
timeouts.  We've been running it on our production Nagios instance (200
hosts, 5000 services; most services use NRPE) for a week and it's working
great.

Before the patch, check_nrpe_ssl was timing out when trying to connect to
hosts that were definitely up.  A local expert (Jay Cotton) looked at our
sniffer trace, explained the problem, and offered a fix.  The server end
was "ungracefully" closing the socket connected to the client.  For some
reason (NAT device in the middle, TCP stack on the HP-UX 11.11 client,
or?), the client thought the connection was still open.  The client
continues saying FIN after the server has sent RST.  The client keeps the
connection in the LAST_ACK state for several minutes.

That's not so bad in itself but we were unlucky enough to have our server
(Debian stable box running a 2.6 kernel) attempt a new connection to the
same client using the same source port!  The client thought it was already
talking to the server on that port so it didn't play along and
check_nrpe_ssl on the server timed out.

Closing the connection gracefully eliminated the problem.  Below is Jay's
note describing his fix.

Thanks!

------------------------------------------------------------------------------
Find the line that reads "close(sd)" a few lines after the line that reads
"/* close the connection */". BTW, you'll notice the close() command is
listed in the source code a couple of times below this. Technically those
shouldn't be there since the connection will already be closed...a small
programming bug, but one that isn't going to affect us.

Although using close() can work, it usually results in a RST being sent
because the program exits before reading all data or getting the FIN from
the remote. For a graceful close you need to wait until receiving the FIN
from the remote before issuing the close() command. To do this requires
cooperation from the remote, but in most cases isn't a problem (sending the
FIN will cause the other end of the connection to close).

Here's what you're supposed to do:

1. use the shutdown() command to send a FIN: shutdown(sd, SHUT_WR)
2. use select() and recv() to process incoming data from remote (actual
data can be ignored). When the remote closes, the recv() command will
return 0, indicating a graceful close. The select() command is needed to
make sure recv() doesn't block indefinitely...allowing you to put an upper
limit on how long to wait. After all, the remote may decide not to close
the connection gracefully.
3. Finally, call close() and continue processing normally. At this point,
both ends of the connection are closed properly and calling close() merely
releases the resources we allocated for that socket.

Here's a function you can add to the code that accomplishes this task:

void graceful_close(int sd, int timeout)
{
        fd_set in;
        struct timeval tv;
        char buf[1000];

        shutdown(sd, SHUT_WR);  // Send FIN packet
        for ( ; ; ) {
                FD_ZERO(&in);
                FD_SET(sd, &in);
                tv.tv_sec = timeout / 1000;
                tv.tv_usec = (timeout % 1000) * 1000;
                if (1 != select(sd + 1, &in, NULL, NULL, &tv)) break;   //
timeout or error
                if (0 >= recv(sd, buf, sizeof(buf), 0)) break;  // no more
data (FIN or RST)
        }
        closesocket(sd);
}

Instead of calling close(sd) we'll call graceful_close(sd, 5000) to wait up
to 5 seconds (5000 milliseconds) for the remote to close before aborting
the connection. This should fix the problem...I think. :)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch
Type: text/x-patch
Size: 907 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060831/42a10bc3/attachment.bin>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list