Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save anonymous/10285483 to your computer and use it in GitHub Desktop.
Save anonymous/10285483 to your computer and use it in GitHub Desktop.
[PATCH] Set SO_REUSEADDR on outgoing TCP connections
From 3387e6b4374b4dc50ebe949a273825931f2e115b Mon Sep 17 00:00:00 2001
From: Marek Majkowski <marek@cloudflare.com>
Date: Wed, 9 Apr 2014 16:42:06 +0100
Subject: [PATCH] Set SO_REUSEADDR on outgoing TCP connections
Usually, when establishing a connection the kernel allocates outgoing
TCP/IP port automatically from an ephemeral port range. Unfortunately
when selecting the outgoing source IP (using bind before connect) the
kernel needs a unique port number. As the result it can only establish
a single outgoing connection from a single source port. This can cause
problems with a large number of outgoing proxy connections - it's
possible for the kernel to run out free ports in the ephemeral range.
The situation can be improved - TCP/IP allows any number of
connections to share outgoing TCP/IP port and host pair assuming the
destination addresses differ.
This patch sets a SO_REUSEADDR flag on the connections that use bind
before connect to select ougoing source address. This will allow the
kernel to reuse source port numbers, given that the destination
addresses are different.
The patch will work perfectly well assuming there aren't too many
connections to one destination address and port. If that happens the
kernel may randomly allocate an outgoing port number that is already
used for a given destination and attempt to connect() will fail with
EADDRNOTAVAIL. This is fairly easy to detect, and we can just retry
connecting again, using another random source port allocated by the
kernel.
Unfortunately it introduces some nondeterminism, in an extreme
situation a connection attempt may fail while we still have a
theoretical chance of success. This situation is not worse than what
we have right now: currently the number of outgoing ports is strongly
limited by a size of ephemeral port range. With this patch it's
possible to establish pretty much unlimited number of outgoing
connections, assuming there are many destinations.
To work around the situation of thousands connections to the same
destination address, we will retry connection a few times before
giving up. The patch hardcodes a retry count of 8, which I believe
strikes the right balance between the probability of success and the
cost of retrying socket allocation.
Assuming 1 connection already present to exactly the same destination,
the probability of collision is 1/ephemeral_port_range given no retry
attempts.
Given 8 retries we get following numbers:
* If 1% of ephemeral_ports are busy with given destination address,
eight retry attempts will fail for a one connection in 9999999999999998.
* For 10%: one in 100000000
* For 50%: one in 256
Finally, during the last retry run we do *not* set the SO_REUSEADDR
flag, making sure the kernel really doesn't have any free port
left. Unfortunately there is a side effect to not setting this flag:
we limit the outgoing port range for further connections, as source
ports without SO_REUSEADDR can't be reused.
---
src/event/ngx_event_connect.c | 34 ++++++++++++++++++++++++++++++++++
src/os/unix/ngx_errno.h | 1 +
2 files changed, 35 insertions(+)
diff --git a/src/event/ngx_event_connect.c b/src/event/ngx_event_connect.c
index f3552a3..c314d59 100644
--- a/src/event/ngx_event_connect.c
+++ b/src/event/ngx_event_connect.c
@@ -21,12 +21,15 @@ ngx_event_connect_peer(ngx_peer_connection_t *pc)
ngx_socket_t s;
ngx_event_t *rev, *wev;
ngx_connection_t *c;
+ ngx_int_t bind_retries = 8;
rc = pc->get(pc, pc->data);
if (rc != NGX_OK) {
return rc;
}
+retry:
+
s = ngx_socket(pc->sockaddr->sa_family, SOCK_STREAM, 0);
ngx_log_debug1(NGX_LOG_DEBUG_EVENT, pc->log, 0, "socket %d", s);
@@ -67,6 +70,15 @@ ngx_event_connect_peer(ngx_peer_connection_t *pc)
}
if (pc->local) {
+ if (bind_retries > 1) {
+ int reuseaddr = 1;
+ if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR,
+ (const void *) &reuseaddr, sizeof(int)) == -1) {
+ ngx_log_debug0(NGX_LOG_DEBUG_EVENT, pc->log, ngx_socket_errno,
+ "setsockopt(SO_REUSEADDR) failed");
+ }
+ }
+
if (bind(s, pc->local->sockaddr, pc->local->socklen) == -1) {
ngx_log_error(NGX_LOG_CRIT, pc->log, ngx_socket_errno,
"bind(%V) failed", &pc->local->name);
@@ -137,6 +149,28 @@ ngx_event_connect_peer(ngx_peer_connection_t *pc)
#endif
)
{
+ if (err == NGX_EADDRNOTAVAIL && pc->local) {
+ /* This error during bind-before-connect means another
+ * connection exists from the requested source port to
+ * the destination port and host. We shall retry using
+ * another port. */
+ ngx_log_debug2(NGX_LOG_DEBUG_EVENT, pc->log, ngx_socket_errno,
+ "bind(%V) before connect(%V) failed on connect, "
+ "retrying", pc->local->name, pc->name);
+
+ if (bind_retries > 1) {
+
+ ngx_close_connection(c);
+ pc->connection = NULL;
+ close(s);
+
+ bind_retries -= 1;
+
+ goto retry;
+
+ }
+ }
+
if (err == NGX_ECONNREFUSED
#if (NGX_LINUX)
/*
diff --git a/src/os/unix/ngx_errno.h b/src/os/unix/ngx_errno.h
index 16cafda..40434a9 100644
--- a/src/os/unix/ngx_errno.h
+++ b/src/os/unix/ngx_errno.h
@@ -53,6 +53,7 @@ typedef int ngx_err_t;
#define NGX_ENOMOREFILES 0
#define NGX_ELOOP ELOOP
#define NGX_EBADF EBADF
+#define NGX_EADDRNOTAVAIL EADDRNOTAVAIL
#if (NGX_HAVE_OPENAT)
#define NGX_EMLINK EMLINK
--
1.8.3.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment