Skip to content

Instantly share code, notes, and snippets.

@sachinpkale
Last active June 1, 2017 08:56
Show Gist options
  • Save sachinpkale/07082b35a9dc23fc5c5484bb9b7b56a3 to your computer and use it in GitHub Desktop.
Save sachinpkale/07082b35a9dc23fc5c5484bb9b7b56a3 to your computer and use it in GitHub Desktop.
Jetty tuning for ring-jetty-adapter

Problem Statement

Clojure services generally use default configuration for ring.adapter.jetty. Adapter provides many configuration options as listed here. We will talk about two important configurations parameters while tuning Jetty: max-threads and accept-queue-size. Following are the default values:

  • max-threads 50
  • accept-queue-size 2^31 -1 For high throughput server, it is required to tune these parameters and not just rely on the default values.

How it works

  • In Jetty, each new request is added to a BlockingArrayQueue. Queue size is bound by accept-queue-size.
  • A collection of threads accept new TCP/IP connection and add to the queue. These are called connectors. Default value is Math.max(1, Math.min(4,cores/8)). As per Jetty doc, Typically the default is sufficient for modern persistent protocols (HTTP/1.1, HTTP/2 etc.)
  • A collection of threads pick request from the queue and executes it. These are called as worker threads. Number of threads in the collection is bound by max_threads.

What happens if

  • max-threads is too high
    • Each worker thread requires 1024KB (for 64-bit JVM) of stack size (configurable using -Xss JVM option)
    • If we have 1000 worker threads, 1GB of RAM would be required only to manage thread's stack.
    • Also, if throughput is very high and each of the 1000 worker threads is serving requests, objects will get created for each of the request consuming more and more heap space.
    • If heap usage is very high, GC will kick in, putting more pressure on CPU. Major GCs are stop-the-world GCs which will impact response time.
    • In this case, there will be increased response time for the in-flight requests, putting incoming requests in the BlockingQueue.
  • accpt-queue-size is too high
    • In case of high throughput, if number of concurrent requests are more than number of worker-threads, requests in the BlockingQueue start piling up. It can also be happened as explained earlier.
    • If queue size is very high, requests in the queue will timeout even before worker thread picks them up for execution. But this timeout happen only from the client side and server will still execute such requests. So, server load is not decreasing and still client is getting timeouts.
  • max-threads is too small
    • If the number of worker threads is very less, system will not be able to server the requests even if it can handle extra requests.
    • If max-threads is 10, average response time is 50 ms, then if we get 20 requests at the same time, response time for the first 10 requests will be 50ms but for the next 10 requests, it would be 100ms (50 ms extra for waiting in the queue).
    • Having very less worker-threads can also result in Blocking Queue getting piled up. If the queue size is very high, it will result in the problem discussed above.
  • accept-queue-size is too small
    • In case of high throughput, if Blocking Queue is full, server will start rejecting requests resulting in connect timeouts. So, if the values are too small, resources will be under-utilised and if values are too high, there will be resource exhaustion (which is more damaging!).

Optimal Configuration

To calculate optimal configuration, it is always required to do a load-testing of the server and find out the max RPS at which server can respond within expected response time and all the system parameters (load average, N/W, heap usage etc.) are under the limit. We will discuss optimal configuration in following sections assuming server can handle 300RPS with average response time 100ms.

  • max-threads
    • For CPU bound requests, there is no point in having number of worker threads much more than number of CPU cores available.
    • For IO bound requests, we should have following strategy:
      • For our server with capacity 300RPS and avg response time 100ms, a thread can handle 10 requests per second.
      • So, for 300 requests, we need 30 worker-threads at minimum.
      • We should provide some buffer to the minimum worker-threads required. It can very well be 2 or 3 times the minimum value.
      • If we use 2X, we get 60 as number of max-threads (Netflix uses 3X).
  • accpet-queue-size
    • Limiting queue size
      • Assume that client has timeout of 10 seconds.
      • So, for 300RPS server, there is no point of having more than 3000 requests in the queue, otherwise requests will be timed-out at client and will still get executed at the server wasting resources.
      • Again, with some buffer, queue size of 6000 can be used.
    • Using default value (very high) and load shedder
      • On each request, load shedder will check whether queue size is under acceptable range (which can be configured at run-time). If not, it will discard the request with 503.
      • One advantage of this on previous one is: client will get 503 over connect-timeout and can slow down the requests instead of bombarding them again using retries.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment