Pipeline | Pipeline io_uring | Non-pipelined | Non-pipelined io_uring | |
---|---|---|---|---|
CPU | 99 | 50 (-50%) | 97 | 48 (-50%) |
RPS | 2,592,670 | 2,878,222 (+11%) | 497,429 | 631,976 (+26%) |
Working set | 79 | 81 | 79 | 81 |
Latency (mean) | 1.28 | 0.98 | 1.07 | 1.47 |
Latency (99th) | n/a | 7.57 | 14.8 | 14.67 |
Last active
August 29, 2021 14:06
-
-
Save sebastienros/82f5dd4ef1560b793574f3c7bd8dc656 to your computer and use it in GitHub Desktop.
I assumed that the ThreadCount
would be controlled due to this config in the Benchmarks repo. The results make more sense now, thanks 😅.
In fact, I set the default ThreadCount
to half the logical threads (~ the number of physical cores) based on the findings in comment you've linked.
When comparing the results from tmds/Tmds.LinuxAsync#39 (comment) with the results above, we notice an increase in RPS from 518,186 -> 631,976 with the update to kernel v5.7 and the necessary code changes to leverage IORING_FEAT_FAST_POLL
. Assuming, of course, the infrastructure hasn't changed since then. That would be as close to "free lunch" as it gets 🚀
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
These benchmarks were ran on a 12-core machine, because Citrine setup doesn't have the required kernel (yet).
The Transport defaults to half the number of Processors: https://github.com/tkp1n/IoUring.Transport/blob/bed647373487aac25a58de34598e2bc9251c903b/src/IoUring.Transport/IoUringOptions.cs#L9.
And ApplicationSchedulingMode is set to Inline: https://github.com/tkp1n/IoUring.Transport/blob/13e571a5d6d0e63937da2e8a0e18a9a589648bb8/tests/PlatformBenchmarks/BenchmarkConfigurationHelpers.cs#L59
This means the code runs on half of the processors, so 50% is the expected CPU load.
If you increase the
ThreadCount
option, CPU usage will go up, but RPS will probably go down (cfr benchmarks ran in tmds/Tmds.LinuxAsync#39 (comment)).