Thread pools on the JVM should usually be divided into the following three categories:
- CPU-bound
- Blocking IO
- Non-blocking IO polling
Each of these categories has a different optimal configuration and usage pattern.
For CPU-bound tasks, you want a bounded thread pool which is pre-allocated and fixed to exactly the number of CPUs. The only work you will be doing on this pool will be CPU-bound computation, and so there is no sense in exceeding the number of CPUs unless you happen to have a really particular workflow that is amenable to hyperthreading (in which case you could go with double the number of CPUs). Note that the old wisdom of "number of CPUs + 1" comes from mixed-mode thread pools where CPU-bound and IO-bound tasks were merged. We won't be doing that.
The problem with a fixed thread pool is that any blocking IO operation (well, any blocking operation at all) will eat a thread, which is an extremely finite resource. Thus, we want to avoid blocking at all costs on the CPU-bound pool. Unfortunately, this isn't always possible (e.g. when being forced to use a blocking IO library). When this is the case, you should always push your blocking operations (IO or otherwise) over to a separate thread pool. This separate thread pool should be caching and unbounded with no pre-allocated size. To be clear, this is a very dangerous type of thread pool. It isn't going to prevent you from just allocating more and more threads as the others block, which is a very dangerous state of affairs. You need to make sure that any data flow which results in running actions on this pool is externally bounded, meaning that you have semantically higher-level checks in place to ensure that only a fixed number of blocking actions may be outstanding at any point in time (this is often done with a non-blocking bounded queue).
The final category of useful threads (assuming you're not a Swing/SWT application) is asynchronous IO polls. These threads basically just sit there asking the kernel whether or not there is a new outstanding async IO notification, and forward that notification on to the rest of the application. You want to handle this with a very small number of fixed, pre-allocated threads. Many applications handle this task with just a single thread! These threads should be given the maximum priority, since the application latency will be bounded around their scheduling. You need to be careful though to never do any work whatsoever on this thread pool! Never ever ever. The moment you receive an async notification, you should be immediately shifting back to the CPU pool. Every nanosecond you spend on the async IO thread(s) is added latency on your application. For this reason, some applications may find slightly better performance by making their async IO pool 2 or 4 threads in size, rather than the conventional 1.
I've seen a lot of advice floating around about not using global thread pools, such as scala.concurrent.ExecutionContext.global
. This advice is rooted in the fact that global thread pools can be accessed by arbitrary code (often library code) and you cannot (easily) ensure that this code is using the thread pool appropriately. How much of a concern this is for you depends a lot on your classpath. Global thread pools are pretty darn convenient, but by the same token, it also isn't all that hard to have your own application-internal global pools. So… it doesn't hurt.
On that note, view with extreme suspicion any framework or library which either a) makes it difficult to configure the thread pool, or b) just straight-up defaults to a pool that you cannot control.
Either way, you're almost always going to have some sort of singleton object somewhere in your application which just has these three pools, pre-configured for use. If you ascribe to the "implicit ExecutionContext
pattern", then you should make the CPU pool the implicit one, while the others must be explicitly selected.
Kernel threads are always going to be a scarce resource. If you really boil it down, the true underlying resources here are the physical threads provided by the hardware, which are physically limited by definition. Even ascending the abstraction tower though, kernel threads are relatively heavyweight both in the operating system itself and within the JVM. In general, it's difficult for a single process to have more than a few thousand of them, even with careful tuning, and it is always more optimal to have vastly fewer.
What Loom does is play the same trick as frameworks like Cats Effect, which is to say, it creates an abstraction on top of the underlying kernel threads (which it calls "carrier threads"). This abstraction is very lightweight and strictly (sort of…) non-blocking, which makes it possible to have many millions of them within a single process without causing problems. Perhaps confusingly, Loom defines this abstraction to be
Thread
itself and integrates it directly into the JVM, meaning that any code written on the JVM is able to take advantage of it (as opposed to frameworks like Cats Effect, where you need to explicitly opt-in to things likeIO
orFuture
).So what's happening here is
Thread
is being redefined to be a more lightweight abstraction sitting on top of underlying carrier threads, which are just as scarce and heavyweight as they've ever been.The tradeoff is that you need to be very careful with things that hard-block the underlying carrier thread. Loom tries to solve this problem by integrating very tightly into the JVM and the Java Standard Library, such that mechanisms which would normally block the carrier thread instead deschedule the virtual thread, allowing other threads to have access. More succinctly, it converts
Unsafe.park
into a callback which resumes theThread
continuation when run.This is a clever trick, particularly integrated into the JVM, but it isn't perfect. As you pointed out, any blocking in native code is completely outside the realm of what Loom can protect you from, and this sort of blocking is far more common than you might expect. Netty, for example, very aggressively blocks in native code due to the fact that it implements its own OS-specific interfaces to asynchronous IO layers (such as
epoll
andio_uring
). Even without third-party frameworks though, examples abound where native blocking is unavoidable.new URL("https://www.google.com).hashCode()
is one example, since it delegates to the native OS DNS client, which in turn is blocking on all major operating systems. Another example is file IO, which is non-blocking on NTFS and can be non-blocking on versions of Linux which supportio_uring
, but which is fundamentally blocking on APFS and HFS+.In other words, Loom is a classic leaky abstraction: it promises something which it cannot deliver, and in doing so invites you to write code which makes assumptions which do not hold in many common scenarios. This is where it really differs from frameworks like Cats Effect or Vert.x, which are very up front about the fact that blocking is bad and push you (the user) quite hard to declare your blocking so that it can be managed in a less dangerous way (in particular, via shunting strategies such as what is described in the OP).