Created
April 9, 2014 15:52
-
-
Save anonymous/10285483 to your computer and use it in GitHub Desktop.
[PATCH] Set SO_REUSEADDR on outgoing TCP connections
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
From 3387e6b4374b4dc50ebe949a273825931f2e115b Mon Sep 17 00:00:00 2001 | |
From: Marek Majkowski <marek@cloudflare.com> | |
Date: Wed, 9 Apr 2014 16:42:06 +0100 | |
Subject: [PATCH] Set SO_REUSEADDR on outgoing TCP connections | |
Usually, when establishing a connection the kernel allocates outgoing | |
TCP/IP port automatically from an ephemeral port range. Unfortunately | |
when selecting the outgoing source IP (using bind before connect) the | |
kernel needs a unique port number. As the result it can only establish | |
a single outgoing connection from a single source port. This can cause | |
problems with a large number of outgoing proxy connections - it's | |
possible for the kernel to run out free ports in the ephemeral range. | |
The situation can be improved - TCP/IP allows any number of | |
connections to share outgoing TCP/IP port and host pair assuming the | |
destination addresses differ. | |
This patch sets a SO_REUSEADDR flag on the connections that use bind | |
before connect to select ougoing source address. This will allow the | |
kernel to reuse source port numbers, given that the destination | |
addresses are different. | |
The patch will work perfectly well assuming there aren't too many | |
connections to one destination address and port. If that happens the | |
kernel may randomly allocate an outgoing port number that is already | |
used for a given destination and attempt to connect() will fail with | |
EADDRNOTAVAIL. This is fairly easy to detect, and we can just retry | |
connecting again, using another random source port allocated by the | |
kernel. | |
Unfortunately it introduces some nondeterminism, in an extreme | |
situation a connection attempt may fail while we still have a | |
theoretical chance of success. This situation is not worse than what | |
we have right now: currently the number of outgoing ports is strongly | |
limited by a size of ephemeral port range. With this patch it's | |
possible to establish pretty much unlimited number of outgoing | |
connections, assuming there are many destinations. | |
To work around the situation of thousands connections to the same | |
destination address, we will retry connection a few times before | |
giving up. The patch hardcodes a retry count of 8, which I believe | |
strikes the right balance between the probability of success and the | |
cost of retrying socket allocation. | |
Assuming 1 connection already present to exactly the same destination, | |
the probability of collision is 1/ephemeral_port_range given no retry | |
attempts. | |
Given 8 retries we get following numbers: | |
* If 1% of ephemeral_ports are busy with given destination address, | |
eight retry attempts will fail for a one connection in 9999999999999998. | |
* For 10%: one in 100000000 | |
* For 50%: one in 256 | |
Finally, during the last retry run we do *not* set the SO_REUSEADDR | |
flag, making sure the kernel really doesn't have any free port | |
left. Unfortunately there is a side effect to not setting this flag: | |
we limit the outgoing port range for further connections, as source | |
ports without SO_REUSEADDR can't be reused. | |
--- | |
src/event/ngx_event_connect.c | 34 ++++++++++++++++++++++++++++++++++ | |
src/os/unix/ngx_errno.h | 1 + | |
2 files changed, 35 insertions(+) | |
diff --git a/src/event/ngx_event_connect.c b/src/event/ngx_event_connect.c | |
index f3552a3..c314d59 100644 | |
--- a/src/event/ngx_event_connect.c | |
+++ b/src/event/ngx_event_connect.c | |
@@ -21,12 +21,15 @@ ngx_event_connect_peer(ngx_peer_connection_t *pc) | |
ngx_socket_t s; | |
ngx_event_t *rev, *wev; | |
ngx_connection_t *c; | |
+ ngx_int_t bind_retries = 8; | |
rc = pc->get(pc, pc->data); | |
if (rc != NGX_OK) { | |
return rc; | |
} | |
+retry: | |
+ | |
s = ngx_socket(pc->sockaddr->sa_family, SOCK_STREAM, 0); | |
ngx_log_debug1(NGX_LOG_DEBUG_EVENT, pc->log, 0, "socket %d", s); | |
@@ -67,6 +70,15 @@ ngx_event_connect_peer(ngx_peer_connection_t *pc) | |
} | |
if (pc->local) { | |
+ if (bind_retries > 1) { | |
+ int reuseaddr = 1; | |
+ if (setsockopt(s, SOL_SOCKET, SO_REUSEADDR, | |
+ (const void *) &reuseaddr, sizeof(int)) == -1) { | |
+ ngx_log_debug0(NGX_LOG_DEBUG_EVENT, pc->log, ngx_socket_errno, | |
+ "setsockopt(SO_REUSEADDR) failed"); | |
+ } | |
+ } | |
+ | |
if (bind(s, pc->local->sockaddr, pc->local->socklen) == -1) { | |
ngx_log_error(NGX_LOG_CRIT, pc->log, ngx_socket_errno, | |
"bind(%V) failed", &pc->local->name); | |
@@ -137,6 +149,28 @@ ngx_event_connect_peer(ngx_peer_connection_t *pc) | |
#endif | |
) | |
{ | |
+ if (err == NGX_EADDRNOTAVAIL && pc->local) { | |
+ /* This error during bind-before-connect means another | |
+ * connection exists from the requested source port to | |
+ * the destination port and host. We shall retry using | |
+ * another port. */ | |
+ ngx_log_debug2(NGX_LOG_DEBUG_EVENT, pc->log, ngx_socket_errno, | |
+ "bind(%V) before connect(%V) failed on connect, " | |
+ "retrying", pc->local->name, pc->name); | |
+ | |
+ if (bind_retries > 1) { | |
+ | |
+ ngx_close_connection(c); | |
+ pc->connection = NULL; | |
+ close(s); | |
+ | |
+ bind_retries -= 1; | |
+ | |
+ goto retry; | |
+ | |
+ } | |
+ } | |
+ | |
if (err == NGX_ECONNREFUSED | |
#if (NGX_LINUX) | |
/* | |
diff --git a/src/os/unix/ngx_errno.h b/src/os/unix/ngx_errno.h | |
index 16cafda..40434a9 100644 | |
--- a/src/os/unix/ngx_errno.h | |
+++ b/src/os/unix/ngx_errno.h | |
@@ -53,6 +53,7 @@ typedef int ngx_err_t; | |
#define NGX_ENOMOREFILES 0 | |
#define NGX_ELOOP ELOOP | |
#define NGX_EBADF EBADF | |
+#define NGX_EADDRNOTAVAIL EADDRNOTAVAIL | |
#if (NGX_HAVE_OPENAT) | |
#define NGX_EMLINK EMLINK | |
-- | |
1.8.3.2 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment