Skip to content

Instantly share code, notes, and snippets.

@headius
Last active August 24, 2020 23:47
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save headius/d8e0468bceebccd7d3c951774845bf07 to your computer and use it in GitHub Desktop.
Save headius/d8e0468bceebccd7d3c951774845bf07 to your computer and use it in GitHub Desktop.

The problem:

For an HTTP 1.1 request. JRuby's packet sequence goes like this:

  1. The request is sent from the client
  2. The server sends a packet back with the 200 OK, headers, and ack of the request
  3. The client sends an ack of the headers
  4. The server sends the body of the request

On Linux with delayed ACK, there's around 0.04 seconds between (2) and (3), which drastically reduces throughput.

Setting TCP_QUICKACK on the client works if set near the request write. This is at least necessary for each request and possibly every write because TCP_QUICKACK apparently can get reset to default as a result of several other TCP operations.

Meanwhile, Puma sets a few flags to try to reduce latency:

  • TCP_NODELAY: reduce the sending of small packets that break up a response
  • TCP_CORK: aggressively accumulate data into the packets that are sent

These are similar but apparently mutually exclusive. At some level, these seem to be contributing to MRI sending a different sequence of packets:

  1. client request
  2. server 200 OK, headers, and as much of body as it can cram in, plus ack of the request
  3. client ack of headers and body
  4. server remaining body

As a result the delayed ACK does not appear to be a factor, possibly because after the server sends the remaining body it flushes.

The above flags are set via setsockopt, which we have stubbed out to only support operations that the JDK sockets export; this means TCP_CORK is not settable. We should be setting TCP_NODELAY, but it does not appear to help counter the delayed ack from the client.

Additional problem:

Puma uses syswrite to avoid buffering of the response. We implement syswrite by calling what amounts to write(2), which may be buffering on the line. In any case, it's not a direct write, which probably means that we're intefering with buffering decisions.

The bottom line appears to be that because our sockets do not support the Linux-specific socket options that Puma expects, we are causing a slightly different packet sequence that is subject to delayed ACK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment