Skip to content

Instantly share code, notes, and snippets.

@paul-guo-
Last active September 15, 2020 05:54
Show Gist options
  • Save paul-guo-/64aa07b888f8351346b562169755f311 to your computer and use it in GitHub Desktop.
Save paul-guo-/64aa07b888f8351346b562169755f311 to your computer and use it in GitHub Desktop.
SO_REUSEADDR for random port (select by kernel via bind())

https://github.com/greenplum-db/gpdb/pull/8884

I digged a bit on this, and found there is a blog about the case I care about: https://gavv.github.io/articles/ephemeral-port-reuse/ It says,

Hence, when an ephemeral port is allocated, SO_REUSEADDR enables the kernel to reuse any other non-listening ephemeral port.

The important point here is that the kernel doesn’t check whether there is an opened socket for an ephemeral port, it only checks whether there is a socket in the listening state for that port.

This means that the kernel is free to reuse an ephemeral port of any opened UDP socket (because listen is not used for datagram sockets) and any opened TCP socket for which listen was not called yet.

Please note the word non-listening . There is a program in the blog to verify the words. I tried it (with small modification, i.e. adding gethchar() in the end to prevent fd close when the program exits and run the program multiple times concurrently to exhaust tcp ports) and verified the tcp nolisten reuseaddr case.

I also roughly checked the latest Linux kernel (inet). Here are some tentative conclusions.

Without SO_REUSEADDR, for our case, if bind() fails with EADDRINUSE, it should normally mean kernel can not find an available port for bind() - EADDRINUSE is a bit misleading for this case though. Check the below code. https://github.com/torvalds/linux/blob/1c4e395cf7ded47f33084865cbe2357cdbe4fd07/net/ipv4/af_inet.c#L526

So reusing TIME_WAIT ports seem to be useful.

With SO_REUSEADDR, multiple concurrent bind() could bind to the same non-listening port. This seems to be not a problem for the non-SO_REUSEADDR case. That means that in our code, with SO_REUSEADDR it is possible subsequent listen() could fail (errno should be EADDRINUSE) even the available tcp port number are sufficient. Check the below code and its callers for related logic:

https://github.com/torvalds/linux/blob/1c4e395cf7ded47f33084865cbe2357cdbe4fd07/net/ipv4/inet_connection_sock.c#L149

So questions: If you could easily reproduce this could you please apply your patch to see if there is listen() return error message in your environment? If the above theory is correct, the right fix seems to be:

enable the SO_REUSEADDR option. retry even if listen() fails with EADDRINUSE, and error out after some tries. The SO_REUSEADDR/bind behavior seems to be not friendly for programmers but follows the man page of socket(7) which probably aligns with related standard unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment