upeter/VirtualThreadExplanations.txt

## VirtualThreadExplanations.txt
1) Which code is faster?

You said that when project Loom is implemented in Java 21 current projects (that are using sequential code) can benefit just by upgrading to Java 21.

Do you mean:

a) This counts for all code.
b) This counts for all multithreaded code.
c) This counts for all multithreaded code after it is rewritten to the virtual thread style.

I ran my Spring Boot application with Java 21 preview, and couldn't find any difference (outside it was a bit slower on startup than Java 20). So I kind of assumed that the answer is 'c'.

Answer: There are two aspects here:
- Platform Threads block when blocking code is called (java.io/jdbc etc.)
- Platform Threads are expensive that's why we use ThreadPools to re-use them and prevent too many Threads to be created.
- The advantage of Virtual Threads is that they don't block even when blocking code is called (java.io/jdbc etc.). - Because they are 'cheap' they also do not need to be pooled.
- Therefore, the concrete gain is that an application that uses Virtual Threads cannot exhaust a ThreadPool, there is always room to spawn new Virtual Threads. So when your application is under heavy load everything continues working with Virtual Threads, whereas with Platform Threads in a ThreadPool the pool can get exhausted once all Platform Threads are blocked when calling blocking code (java.io/jdbc etc.)
- This behavior you could only achieve with reactive code and non-blocking libraries (java.nio/R2DBC etc.)
- So: the correct answer is a)
- Sidenote: only running Spring Boot with Java 21 preview won't make it automatically use Virtual Threads. You have to configure the Virtual Thread factory specifically for that. See: https://spring.io/blog/2022/10/11/embracing-virtual-threads

2) Platform threads vs. Virtual threads

Platform threads will still be available. Why is this the case? Is there some kind of trade off? If virtual threads are always better, why don't we use the current syntax and use virtual threads on the background?

Answer:
- The non-blocking behavior of Virtual Threads only works for blocking code that uses the Java standard libraries like java.io or java.util.concurrent.
- If you do your own IO with e.g. native code, or have some driver that calls native code, a Virtual Thread's Carrier Thread would be blocked (pinned is the word you will find there). Blocking Carrier Threads should be avoided whenever possible, since Virtual Threads always require Carrier Threads for their processing.
- There will be always Carrier Threads available if you use the  standard retrofited blocking libraries like - as said - java.io or java.util.concurrent, since the JVM ensures that these Carrier Threads are not blocked.
- But in cases where the underlying Carrier Threads are blocked, namely when calling native code and synchronized blocks, you should always use dedicated Platform Threads. That's where they are still needed.

3) On the delay example

As the main example, you used a sleep method in the code. The point was that instead of blocking the platform thread, the virtual thread can suspend the sleep and another virtual thread can do some work.

A sleep call is of course not a very common use case. How does it effect code that actually performs some work:

doSomething1(); // takes 1 second
doSomething2(); // takes 1 second
doSomething3(); // takes 1 second

Say I call the above code 1000 times. When using virtual threads instead of normal threads, is it expected that it runs faster, or is the same because at the end the computer has a fixed amount of platform threads that can do actual work?

As one task would take 3 seconds. How long would it take to run 1000 tasks (assuming the CPU has 16 threads). Does it mean i

Answer:

I think in the following statement you still have a conceptual missmatch:
"As the main example, you used a sleep method in the code. The point was that instead of blocking the platform thread, the virtual thread can suspend the sleep and another virtual thread can do some work."

It should go like this:
"As the main example, you used a sleep method in the code. The point was that instead of blocking the platform thread, the virtual thread will be suspended when sleep is called, so that the underlying Carrier Thread can serve another Virtual Thread."

What's really crucial to understand is that a Virtual Thread can't do anything. It always requires a Carrier Thread. The Carrier Thread you don't see, nor do you have access to it, so for us programmers it's transparent. That makes it a bit hard to grasp probably, because it's crucial but you don't see it.

So the whole idea is: as long as Carrier Threads are never blocked, they can contineously serve Virtual Threads ensuring maximal concurrency and resource efficiency of your application. Once Carrier Threads are blocked, which is only possible by calling native code or synchronized blocks, Virtual Threads cannot be served anymore, causing a decrease of the amount of work the application can perform.

Prior to Virtual Threads, when we had ThreadPools with Platform Threads, you could get the same behaviour when the ThreadPool was exhausted - which I illustrated in my slides.

So to answer your question is:
The sleep might not be common but perfectly illustrates a piece of code that is blocking with Platform Threads but non-blocking with Virtual Threads. The very same applies for every remote IO call you would make.

If you have so called CPU bound work, e.g. doing some heavy calculations, crypto etc., which means no IO, Virtual Threads won't add anything, because the underying Carrier Thread will be busy with this task and cannot serve another Virtual Thread.

So your code sample is only faster with Virtual Threads if they involve some blocking operations - like simulated with sleep, or IO - then your application can do many more concurrently with Virtual Threads than with Platform Threads because of the reasons explained above.


4) Overhead

I saw some presentations on coroutines, and they gave examples like 2.4 million coroutines are running, but
what are the benefits in throughput? Why isn't there a lot of overhead running so many coroutines/virtual threads?
Even if they are abstracted from the platform, I would expect some form of overhead.

Answer:
The overhead comes in the form of scheduling, suspend and resume work for the Carrier Threads. Conceptually it works as follows:
- I start a virtual Thread with some code in the run() method, say: sleep(1000);println("Done");
- This work will be placed on some queue (conceptually)
- Carrier Threads consume work from this queue, so Carrier Thread 1 (CT-1) executes the code of Virtual Thread 1 (VT-1)
- When it hits sleep, the current stack will be copied from CT-1 to VT-1, CT-1 will schedule a new task when 1000 ms have passed and VT-1 is suspended.
- CT-1 then takes the next task from the queue
- once 1000ms have passed CT-[1-?] - we don't know which Carrier Thread will process it, so call it CA-X - get's the task from VT-1 from the queue to proceed. CA-X will then resume the Virtual Thread VT-1 by coping it's stack to itself and continue processing. In this case call println("Done").

So, whenever you read: suspend, resume, copy stack etc. you read: overhead

5) Messaging

I come from the world of messaging (think of JMS, Kafka and Apache Camel). There one often has millions of messages that need to be processed. I configure how a message needs to be processed in a so-called “Camel route”. In this route, I define the steps of what needs to be done.

Camel by default uses a thread pool (as explained here:  https://camel.apache.org/manual/threading-model.html). So you can set various expect for one or more routes (default is max thread pool of 20).

Say I have a million messages on a Kafka topic that need to be processed by the following route:

public void configure() throws Exception {
from("kafka:{{topicName}}?brokers={{brokers}}")
      .setHeader("CamelAzureStorageBlobBlobName", simple("${exchangeId}"))
       .to("azure-storage-blob://{{accountName}}/{{containerName}}/?accessKey=RAW({{accessKey}})&operation=uploadBlockBlob");
}

I always thought that messaging is excellent of multithreading. If I only had a CPU that could handle millions of threads...
When the underlying mechanism doesn't use a thread pool, but virtual threads, does it mean I could faster process these million of messages?

Say every message + route gets its own virtual thread assigned. Can one million messages be processed in parallel or doesn't it works like this, because at the end they are always mapped to 'carrier/CPU' threads?

Using virtual threads, can I expect a multifactor performance gain in throughput, or is that a faulty way of thinking?

Answer:
Maybe you can answer this question with the knowledge above yourself, but for the sake of completness let's answer it too ;-).
- As said Virtual Threads won't make a single call faster, but they 'might' give you more throughput
- Why 'might'? When your current ThreadPool is never exhausted, meaning whenever a message needs to be processed, there is always a Thread available from the ThreadPool, Virtual Threads won't add anything.
- But should you have such a large amount of messages that eventually you don't have a Thread in your ThreadPool to perform the work, then Virtual Threads 'might' do better.
- Why another 'might'? Because we also need to make a distinction between CPU bound and IO bound work:
-- If it's IO bound, where you need to wait a long time for a reply, Virtual Threads will shine, because the underlying Carrier Threads can contineously serve a large amount of Virtual Threads
-- If it's CPU bound, there won't be a lot of benefit, since the Carrier Thread is busy anyway, so it cannot serve other Virtual Threads.
- So for your use-case above, pumping Kafka messages to azure-storage, which I assume will be IO bound, Virtual Threads might be a good fit.
- Generally speaking we could say, that backend applications, that have some API, do remote calls to other API's and interact with a database, will benefit from Virtual Threads, since they are heavily IO bound.
	1) Which code is faster?

	You said that when project Loom is implemented in Java 21 current projects (that are using sequential code) can benefit just by upgrading to Java 21.

	Do you mean:

	a) This counts for all code.
	b) This counts for all multithreaded code.
	c) This counts for all multithreaded code after it is rewritten to the virtual thread style.

	I ran my Spring Boot application with Java 21 preview, and couldn't find any difference (outside it was a bit slower on startup than Java 20). So I kind of assumed that the answer is 'c'.

	Answer: There are two aspects here:
	- Platform Threads block when blocking code is called (java.io/jdbc etc.)
	- Platform Threads are expensive that's why we use ThreadPools to re-use them and prevent too many Threads to be created.
	- The advantage of Virtual Threads is that they don't block even when blocking code is called (java.io/jdbc etc.). - Because they are 'cheap' they also do not need to be pooled.
	- Therefore, the concrete gain is that an application that uses Virtual Threads cannot exhaust a ThreadPool, there is always room to spawn new Virtual Threads. So when your application is under heavy load everything continues working with Virtual Threads, whereas with Platform Threads in a ThreadPool the pool can get exhausted once all Platform Threads are blocked when calling blocking code (java.io/jdbc etc.)
	- This behavior you could only achieve with reactive code and non-blocking libraries (java.nio/R2DBC etc.)
	- So: the correct answer is a)
	- Sidenote: only running Spring Boot with Java 21 preview won't make it automatically use Virtual Threads. You have to configure the Virtual Thread factory specifically for that. See: https://spring.io/blog/2022/10/11/embracing-virtual-threads

	2) Platform threads vs. Virtual threads

	Platform threads will still be available. Why is this the case? Is there some kind of trade off? If virtual threads are always better, why don't we use the current syntax and use virtual threads on the background?

	Answer:
	- The non-blocking behavior of Virtual Threads only works for blocking code that uses the Java standard libraries like java.io or java.util.concurrent.
	- If you do your own IO with e.g. native code, or have some driver that calls native code, a Virtual Thread's Carrier Thread would be blocked (pinned is the word you will find there). Blocking Carrier Threads should be avoided whenever possible, since Virtual Threads always require Carrier Threads for their processing.
	- There will be always Carrier Threads available if you use the standard retrofited blocking libraries like - as said - java.io or java.util.concurrent, since the JVM ensures that these Carrier Threads are not blocked.
	- But in cases where the underlying Carrier Threads are blocked, namely when calling native code and synchronized blocks, you should always use dedicated Platform Threads. That's where they are still needed.

	3) On the delay example

	As the main example, you used a sleep method in the code. The point was that instead of blocking the platform thread, the virtual thread can suspend the sleep and another virtual thread can do some work.

	A sleep call is of course not a very common use case. How does it effect code that actually performs some work:

	doSomething1(); // takes 1 second
	doSomething2(); // takes 1 second
	doSomething3(); // takes 1 second

	Say I call the above code 1000 times. When using virtual threads instead of normal threads, is it expected that it runs faster, or is the same because at the end the computer has a fixed amount of platform threads that can do actual work?

	As one task would take 3 seconds. How long would it take to run 1000 tasks (assuming the CPU has 16 threads). Does it mean i

	Answer:

	I think in the following statement you still have a conceptual missmatch:
	"As the main example, you used a sleep method in the code. The point was that instead of blocking the platform thread, the virtual thread can suspend the sleep and another virtual thread can do some work."

	It should go like this:
	"As the main example, you used a sleep method in the code. The point was that instead of blocking the platform thread, the virtual thread will be suspended when sleep is called, so that the underlying Carrier Thread can serve another Virtual Thread."

	What's really crucial to understand is that a Virtual Thread can't do anything. It always requires a Carrier Thread. The Carrier Thread you don't see, nor do you have access to it, so for us programmers it's transparent. That makes it a bit hard to grasp probably, because it's crucial but you don't see it.

	So the whole idea is: as long as Carrier Threads are never blocked, they can contineously serve Virtual Threads ensuring maximal concurrency and resource efficiency of your application. Once Carrier Threads are blocked, which is only possible by calling native code or synchronized blocks, Virtual Threads cannot be served anymore, causing a decrease of the amount of work the application can perform.

	Prior to Virtual Threads, when we had ThreadPools with Platform Threads, you could get the same behaviour when the ThreadPool was exhausted - which I illustrated in my slides.

	So to answer your question is:
	The sleep might not be common but perfectly illustrates a piece of code that is blocking with Platform Threads but non-blocking with Virtual Threads. The very same applies for every remote IO call you would make.

	If you have so called CPU bound work, e.g. doing some heavy calculations, crypto etc., which means no IO, Virtual Threads won't add anything, because the underying Carrier Thread will be busy with this task and cannot serve another Virtual Thread.

	So your code sample is only faster with Virtual Threads if they involve some blocking operations - like simulated with sleep, or IO - then your application can do many more concurrently with Virtual Threads than with Platform Threads because of the reasons explained above.


	4) Overhead

	I saw some presentations on coroutines, and they gave examples like 2.4 million coroutines are running, but
	what are the benefits in throughput? Why isn't there a lot of overhead running so many coroutines/virtual threads?
	Even if they are abstracted from the platform, I would expect some form of overhead.

	Answer:
	The overhead comes in the form of scheduling, suspend and resume work for the Carrier Threads. Conceptually it works as follows:
	- I start a virtual Thread with some code in the run() method, say: sleep(1000);println("Done");
	- This work will be placed on some queue (conceptually)
	- Carrier Threads consume work from this queue, so Carrier Thread 1 (CT-1) executes the code of Virtual Thread 1 (VT-1)
	- When it hits sleep, the current stack will be copied from CT-1 to VT-1, CT-1 will schedule a new task when 1000 ms have passed and VT-1 is suspended.
	- CT-1 then takes the next task from the queue
	- once 1000ms have passed CT-[1-?] - we don't know which Carrier Thread will process it, so call it CA-X - get's the task from VT-1 from the queue to proceed. CA-X will then resume the Virtual Thread VT-1 by coping it's stack to itself and continue processing. In this case call println("Done").

	So, whenever you read: suspend, resume, copy stack etc. you read: overhead

	5) Messaging

	I come from the world of messaging (think of JMS, Kafka and Apache Camel). There one often has millions of messages that need to be processed. I configure how a message needs to be processed in a so-called “Camel route”. In this route, I define the steps of what needs to be done.

	Camel by default uses a thread pool (as explained here: https://camel.apache.org/manual/threading-model.html). So you can set various expect for one or more routes (default is max thread pool of 20).

	Say I have a million messages on a Kafka topic that need to be processed by the following route:

	public void configure() throws Exception {
	from("kafka:{{topicName}}?brokers={{brokers}}")
	.setHeader("CamelAzureStorageBlobBlobName", simple("${exchangeId}"))
	.to("azure-storage-blob://{{accountName}}/{{containerName}}/?accessKey=RAW({{accessKey}})&operation=uploadBlockBlob");
	}

	I always thought that messaging is excellent of multithreading. If I only had a CPU that could handle millions of threads...
	When the underlying mechanism doesn't use a thread pool, but virtual threads, does it mean I could faster process these million of messages?

	Say every message + route gets its own virtual thread assigned. Can one million messages be processed in parallel or doesn't it works like this, because at the end they are always mapped to 'carrier/CPU' threads?

	Using virtual threads, can I expect a multifactor performance gain in throughput, or is that a faulty way of thinking?

	Answer:
	Maybe you can answer this question with the knowledge above yourself, but for the sake of completness let's answer it too ;-).
	- As said Virtual Threads won't make a single call faster, but they 'might' give you more throughput
	- Why 'might'? When your current ThreadPool is never exhausted, meaning whenever a message needs to be processed, there is always a Thread available from the ThreadPool, Virtual Threads won't add anything.
	- But should you have such a large amount of messages that eventually you don't have a Thread in your ThreadPool to perform the work, then Virtual Threads 'might' do better.
	- Why another 'might'? Because we also need to make a distinction between CPU bound and IO bound work:
	-- If it's IO bound, where you need to wait a long time for a reply, Virtual Threads will shine, because the underlying Carrier Threads can contineously serve a large amount of Virtual Threads
	-- If it's CPU bound, there won't be a lot of benefit, since the Carrier Thread is busy anyway, so it cannot serve other Virtual Threads.
	- So for your use-case above, pumping Kafka messages to azure-storage, which I assume will be IO bound, Virtual Threads might be a good fit.
	- Generally speaking we could say, that backend applications, that have some API, do remote calls to other API's and interact with a database, will benefit from Virtual Threads, since they are heavily IO bound.