Performance results compiling Black w/ mypyc
Summary
After a lot of work I can confidently say that compiling Black with mypyc will bring excellent performance wins. On average, the time taken to format a file is halved, excluding startup time. Not all files are treated to the same boost though, quite a few files saw even better improvements (up to 2.38x!) and obviously some files didn't see as good improvements.
Breaking down the improvements:
- Formatting with safety checks: 1.93x faster
- Formatting without safety checks: 2.11x faster
- Blib2to3 parsing: 1.81x faster
- Import (
import black
): 1.16x faster
And here's the results further broken down:
note: *-dict-literal
, *-list-literal
, *-comments
, *-nested
,
and *-strings-list
are microbenchmarks and therefore don't represent
normal real world performance. They aim to measure performance of a
certain domain, so for example *-strings-list
measures how fast Black
can format a long list literal containing lots and lots of strings.
Even though they aren't truly representative of real world code, files
like them do exist IRL and Black has fixed performance issues for these
domains before so they are still relevant.
other notices: these numbers come only from Linux (since it's what I daily drive) so it's totally possible MacOS and/or Windows will see quite different results. Hopefully they aren't in the wrong direction but who knows ... well actually I do plan on collecting numbers on Windows, but I'm impatient to get my results released to the world haha!
formatting with safety checks (fmt)
Faster (17):
- fmt-dict-literal: 281 ms +- 9 ms -> 131 ms +- 6 ms: 2.14x faster
- fmt-list-literal: 162 ms +- 4 ms -> 76.6 ms +- 4.0 ms: 2.12x faster
- fmt-black/mode: 206 ms +- 5 ms -> 99.5 ms +- 6.2 ms: 2.07x faster
- fmt-black/strings: 371 ms +- 8 ms -> 183 ms +- 10 ms: 2.03x faster
- fmt-nested: 332 ms +- 9 ms -> 164 ms +- 9 ms: 2.03x faster
- fmt-black/lines: 1.29 sec +- 0.03 sec -> 639 ms +- 39 ms: 2.01x faster
- fmt-comments: 296 ms +- 6 ms -> 154 ms +- 7 ms: 1.92x faster
- fmt-black/__init__: 1.90 sec +- 0.05 sec -> 993 ms +- 60 ms: 1.91x faster
- fmt-flit/install: 851 ms +- 22 ms -> 447 ms +- 26 ms: 1.90x faster
- fmt-flit_core/config: 1.16 sec +- 0.03 sec -> 608 ms +- 33 ms: 1.90x faster
- fmt-black/output: 192 ms +- 6 ms -> 103 ms +- 6 ms: 1.87x faster
- fmt-black/comments: 475 ms +- 12 ms -> 255 ms +- 14 ms: 1.86x faster
- fmt-black/brackets: 596 ms +- 17 ms -> 323 ms +- 17 ms: 1.84x faster
- fmt-black/nodes: 1.46 sec +- 0.03 sec -> 793 ms +- 48 ms: 1.84x faster
- fmt-black/linegen: 1.83 sec +- 0.05 sec -> 994 ms +- 56 ms: 1.84x faster
- fmt-flit/sdist: 448 ms +- 11 ms -> 247 ms +- 13 ms: 1.81x faster
- fmt-strings-list: 50.7 ms +- 1.7 ms -> 28.4 ms +- 1.5 ms: 1.78x faster
Geometric mean: 1.93x faster
formatting without safety checks (fmt-fast)
Faster (17):
- fmt-fast-dict-literal: 133 ms +- 5 ms -> 55.6 ms +- 3.1 ms: 2.38x faster
- fmt-fast-nested: 152 ms +- 5 ms -> 64.6 ms +- 4.6 ms: 2.35x faster
- fmt-fast-list-literal: 78.7 ms +- 2.3 ms -> 33.9 ms +- 1.7 ms: 2.32x faster
- fmt-fast-flit/install: 422 ms +- 12 ms -> 184 ms +- 12 ms: 2.29x faster
- fmt-fast-black/mode: 94.7 ms +- 2.1 ms -> 42.5 ms +- 2.3 ms: 2.23x faster
- fmt-fast-flit/sdist: 188 ms +- 6 ms -> 86.8 ms +- 5.0 ms: 2.17x faster
- fmt-fast-black/strings: 155 ms +- 4 ms -> 72.1 ms +- 5.5 ms: 2.16x faster
- fmt-fast-black/output: 86.4 ms +- 2.2 ms -> 40.5 ms +- 2.5 ms: 2.13x faster
- fmt-fast-comments: 150 ms +- 4 ms -> 71.9 ms +- 4.6 ms: 2.09x faster
- fmt-fast-black/comments: 197 ms +- 5 ms -> 95.1 ms +- 5.8 ms: 2.07x faster
- fmt-fast-black/brackets: 251 ms +- 7 ms -> 122 ms +- 8 ms: 2.05x faster
- fmt-fast-black/nodes: 659 ms +- 19 ms -> 326 ms +- 21 ms: 2.02x faster
- fmt-fast-black/lines: 582 ms +- 15 ms -> 290 ms +- 15 ms: 2.01x faster
- fmt-fast-black/__init__: 804 ms +- 25 ms -> 408 ms +- 27 ms: 1.97x faster
- fmt-fast-flit_core/config: 561 ms +- 16 ms -> 287 ms +- 16 ms: 1.96x faster
- fmt-fast-black/linegen: 787 ms +- 21 ms -> 408 ms +- 19 ms: 1.93x faster
- fmt-fast-strings-list: 24.5 ms +- 0.9 ms -> 13.1 ms +- 0.7 ms: 1.87x faster
Geometric mean: 2.11x faster
blib2to3 parsing (parse)
Faster (17):
- fmt-dict-literal: 281 ms +- 9 ms -> 131 ms +- 6 ms: 2.14x faster
- fmt-list-literal: 162 ms +- 4 ms -> 76.6 ms +- 4.0 ms: 2.12x faster
- fmt-black/mode: 206 ms +- 5 ms -> 99.5 ms +- 6.2 ms: 2.07x faster
- fmt-black/strings: 371 ms +- 8 ms -> 183 ms +- 10 ms: 2.03x faster
- fmt-nested: 332 ms +- 9 ms -> 164 ms +- 9 ms: 2.03x faster
- fmt-black/lines: 1.29 sec +- 0.03 sec -> 639 ms +- 39 ms: 2.01x faster
- fmt-comments: 296 ms +- 6 ms -> 154 ms +- 7 ms: 1.92x faster
- fmt-black/__init__: 1.90 sec +- 0.05 sec -> 993 ms +- 60 ms: 1.91x faster
- fmt-flit/install: 851 ms +- 22 ms -> 447 ms +- 26 ms: 1.90x faster
- fmt-flit_core/config: 1.16 sec +- 0.03 sec -> 608 ms +- 33 ms: 1.90x faster
- fmt-black/output: 192 ms +- 6 ms -> 103 ms +- 6 ms: 1.87x faster
- fmt-black/comments: 475 ms +- 12 ms -> 255 ms +- 14 ms: 1.86x faster
- fmt-black/brackets: 596 ms +- 17 ms -> 323 ms +- 17 ms: 1.84x faster
- fmt-black/nodes: 1.46 sec +- 0.03 sec -> 793 ms +- 48 ms: 1.84x faster
- fmt-black/linegen: 1.83 sec +- 0.05 sec -> 994 ms +- 56 ms: 1.84x faster
- fmt-flit/sdist: 448 ms +- 11 ms -> 247 ms +- 13 ms: 1.81x faster
- fmt-strings-list: 50.7 ms +- 1.7 ms -> 28.4 ms +- 1.5 ms: 1.78x faster
Geometric mean: 1.93x faster
Noteworthy things
Sadly, these wins won't guarantee Black feels twice as fast since the Python
startup and import time of Black has a costly overhead. When formatting one
small to somewhat large file, startup can easily account for over 50% of the
time needed between the command being executed and being finished. On the
bright side, mypyc does in fact bring a small reduction to import time. It's
not as impressive as the rest of the improvements at only 1.16x but it will
be noticable. On my machine, 1.16x translates to 40ms which is awesome!
Also blackd is the actual solution to avoiding the startup penalty, not
mypyc
Additionally, the optimizations made weren't totally mypyc-only surprisingly enough! Now before you get excited, it ranges from zero to two percent ._. Yeah ... not great, but better than nothing I suppose ¯_(ツ)_/¯
Oh and I can confirm that experimental string processing does in fact have a pretty substantial performance cost. Turning on ESP costs you on average 13.5% in fmt-fast. It can even get worse if your code has more string literals. Strings-list microbenchmarks saw slowdowns over 3x. Though this is a known issue.
On top of everything else, these numbers were collected using wheels compiled via a GitHub Actions workflow. This is important to note because the performance gains are actually lower with these wheels compared to wheels built locally. This makes sense since an older C compiler wouldn't be able to optimize as well as newer ones. The manylinux docker images (used to built highly compatible wheels) come with a pretty old version of gcc and most up to date clang supported was also pretty ancient. Thankfully it wasn't too bad here, at most it costed me ~7% as far as I can tell.
Finally, if you were wondering if those optimizations I made actually made a difference, I can say that yes they did! Locally I saw improvements of ~16%, but the GitHub Actions wheels only saw 10%. Goes to show you the difference compiler versions can make.
Boring "and so my data looks credible" stuff
To collect my numbers I used blackbench (except for import time, for that I just used pyperf) - which is a benchmarking tool I made specifically for Black. I won't go into detail on what it is since that's not the point here, but think of it as like pyperformance but for Black :)
Further more, I made sure to tune my system to avoid as much system noise in the data as possible. I've actually documented my steps but the TR;DR is CPU core isolation, tweaks to the Linux scheduler, frequency pinning, and other stuff was used.
What follows is some metadata just in case that matters:
Some software:
- Blackbench: 21.8a2
- pyperf: 2.2.0
- Python: CPython 3.8.5
- OS: Ubuntu 20.04.02 LTS
- Compiler: Clang 3.4.2-4.el6 with
-g0
and optimization level two
Tested revisions:
- Interpreted main: https://github.com/psf/black/commit/b92ec348439edb8641204a102849bfab51f4dda0
- Interpreted mypyc (ie.
git checkout mypyc-support-pt2
): https://github.com/psf/black/commit/e9834e0c4375803af29879a55be121d7b27241aa - Compiled mypyc: https://github.com/psf/black/commit/e9834e0c4375803af29879a55be121d7b27241aa and the wheels came from: https://github.com/ichard26/black-mypyc-wheels/actions/runs/1133056612
- Compiled mypyc preopt (ie. the above but before optimization work): https://github.com/psf/black/commit/8f42f286eadf381304a92e81330cbc691fc2e1e3 and the wheels came from: https://github.com/ichard26/black-mypyc-wheels/actions/runs/1018677481
System metadata:
Metadata:
- aslr: Full randomization
- boot_time: 2021-08-16 17:58:09
- cpu_config: 0-1=driver:acpi-cpufreq, governor:ondemand; idle:acpi_idle
- cpu_count: 2
- cpu_freq: 0=1300 MHz; 1=1585 MHz
- cpu_model_name: AMD A6-9220 RADEON R4, 5 COMPUTE CORES 2C+3G
- date: 2021-08-16 18:59:58.400971
- hostname: acer-ubuntu
- load_avg_1min: 2.30
- mem_max_rss: 13.5 MB
- perf_version: 2.2.0
- platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.29
- python_cflags: -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall
- python_compiler: GCC 9.3.0
- python_executable: /home/ichard26/programming/oss/black/bm-venv/bin/python
- python_implementation: cpython
- python_version: 3.8.5 (64-bit)
- runnable_threads: 1
- timer: clock_gettime(CLOCK_MONOTONIC), resolution: 1.00 ns
- uptime: 1 hour 1 min 49.5 sec
(bm-venv) ichard26@acer-ubuntu:~/programming/oss/black$ neofetch --ascii
.-/+oossssoo+/-. ichard26@acer-ubuntu
`:+ssssssssssssssssss+:` --------------------
-+ssssssssssssssssssyyssss+- OS: Ubuntu 20.04.2 LTS x86_64
.ossssssssssssssssssdMMMNysssso. Host: Aspire A315-21 V1.03
/ssssssssssshdmmNNmmyNMMMMhssssss/ Kernel: 5.11.0-25-generic
+ssssssssshmydMMMMMMMNddddyssssssss+ Uptime: 1 hour, 3 mins
/sssssssshNMMMyhhyyyyhmNMMMNhssssssss/ Packages: 2449 (dpkg), 10 (snap)
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Shell: fish 3.1.0
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ Resolution: 1366x768
ossyNMMMNyMMhsssssssssssssshmmmhssssssso DE: GNOME
ossyNMMMNyMMhsssssssssssssshmmmhssssssso WM: Mutter
+sssshhhyNMMNyssssssssssssyNMMMysssssss+ WM Theme: Adwaita
.ssssssssdMMMNhsssssssssshNMMMdssssssss. Theme: Yaru-light [GTK2/3]
/sssssssshNMMMyhhyyyyhdNMMMNhssssssss/ Icons: Yaru [GTK2/3]
+sssssssssdmydMMMMMMMMddddyssssssss+ Terminal: gnome-terminal
/ssssssssssshdmNNNNmyNMMMMhssssss/ CPU: AMD A6-9220 RADEON R4 2C+3G (2) @ 2.500GHz
.ossssssssssssssssssdMMMNysssso. GPU: AMD ATI Radeon R2/R3/R4/R5 Graphics
-+sssssssssssssssssyyyssss+- Memory: 1867MiB / 11432MiB
`:+ssssssssssssssssss+:`
.-/+oossssoo+/-.
Finally, here's a ZIP file containing all of the data I collected: https://github.com/psf/black/files/6995838/mypyc-data.zip. Go have fun exploring it yourself with pyperf!
Appendix: some cool tables
Parsing:
Benchmark | interpreted-main | interpreted-mypyc | compiled-mypyc-preopt | compiled-mypyc |
---|---|---|---|---|
parse-black/__init__ | 375 ms | 364 ms: 1.03x faster | 262 ms: 1.44x faster | 245 ms: 1.53x faster |
parse-black/brackets | 116 ms | not significant | 69.4 ms: 1.67x faster | 60.7 ms: 1.91x faster |
parse-black/comments | 109 ms | not significant | 53.4 ms: 2.04x faster | 57.2 ms: 1.90x faster |
parse-black/linegen | 377 ms | 369 ms: 1.02x faster | 265 ms: 1.43x faster | 237 ms: 1.59x faster |
parse-black/lines | 286 ms | 278 ms: 1.03x faster | 144 ms: 1.98x faster | 137 ms: 2.08x faster |
parse-black/mode | 39.1 ms | 38.5 ms: 1.02x faster | 25.1 ms: 1.56x faster | 23.8 ms: 1.64x faster |
parse-black/nodes | 318 ms | not significant | 202 ms: 1.57x faster | 190 ms: 1.68x faster |
parse-black/output | 42.5 ms | 41.6 ms: 1.02x faster | 25.1 ms: 1.69x faster | 25.5 ms: 1.67x faster |
parse-black/strings | 70.3 ms | not significant | 42.3 ms: 1.66x faster | 43.7 ms: 1.61x faster |
parse-comments | 68.6 ms | not significant | 39.3 ms: 1.75x faster | 36.5 ms: 1.88x faster |
parse-dict-literal | 47.5 ms | 46.8 ms: 1.01x faster | 27.4 ms: 1.73x faster | 24.9 ms: 1.91x faster |
parse-flit/install | 215 ms | not significant | 106 ms: 2.03x faster | 111 ms: 1.93x faster |
parse-flit/sdist | 105 ms | not significant | 58.2 ms: 1.80x faster | 53.9 ms: 1.95x faster |
parse-flit_core/config | 275 ms | not significant | 145 ms: 1.89x faster | 126 ms: 2.18x faster |
parse-list-literal | 26.3 ms | not significant | 15.5 ms: 1.69x faster | 14.5 ms: 1.82x faster |
parse-nested | 59.1 ms | not significant | 34.2 ms: 1.73x faster | 30.6 ms: 1.93x faster |
parse-strings-list | 6.45 ms | 6.22 ms: 1.04x faster | 3.94 ms: 1.64x faster | 3.66 ms: 1.76x faster |
Geometric mean | (ref) | 1.02x faster | 1.71x faster | 1.81x faster |
Formatting without safety checks:
Benchmark | interpreted-main | interpreted-mypyc | compiled-mypyc-preopt | compiled-mypyc |
---|---|---|---|---|
parse-black/__init__ | 375 ms | 364 ms: 1.03x faster | 262 ms: 1.44x faster | 245 ms: 1.53x faster |
parse-black/brackets | 116 ms | not significant | 69.4 ms: 1.67x faster | 60.7 ms: 1.91x faster |
parse-black/comments | 109 ms | not significant | 53.4 ms: 2.04x faster | 57.2 ms: 1.90x faster |
parse-black/linegen | 377 ms | 369 ms: 1.02x faster | 265 ms: 1.43x faster | 237 ms: 1.59x faster |
parse-black/lines | 286 ms | 278 ms: 1.03x faster | 144 ms: 1.98x faster | 137 ms: 2.08x faster |
parse-black/mode | 39.1 ms | 38.5 ms: 1.02x faster | 25.1 ms: 1.56x faster | 23.8 ms: 1.64x faster |
parse-black/nodes | 318 ms | not significant | 202 ms: 1.57x faster | 190 ms: 1.68x faster |
parse-black/output | 42.5 ms | 41.6 ms: 1.02x faster | 25.1 ms: 1.69x faster | 25.5 ms: 1.67x faster |
parse-black/strings | 70.3 ms | not significant | 42.3 ms: 1.66x faster | 43.7 ms: 1.61x faster |
parse-comments | 68.6 ms | not significant | 39.3 ms: 1.75x faster | 36.5 ms: 1.88x faster |
parse-dict-literal | 47.5 ms | 46.8 ms: 1.01x faster | 27.4 ms: 1.73x faster | 24.9 ms: 1.91x faster |
parse-flit/install | 215 ms | not significant | 106 ms: 2.03x faster | 111 ms: 1.93x faster |
parse-flit/sdist | 105 ms | not significant | 58.2 ms: 1.80x faster | 53.9 ms: 1.95x faster |
parse-flit_core/config | 275 ms | not significant | 145 ms: 1.89x faster | 126 ms: 2.18x faster |
parse-list-literal | 26.3 ms | not significant | 15.5 ms: 1.69x faster | 14.5 ms: 1.82x faster |
parse-nested | 59.1 ms | not significant | 34.2 ms: 1.73x faster | 30.6 ms: 1.93x faster |
parse-strings-list | 6.45 ms | 6.22 ms: 1.04x faster | 3.94 ms: 1.64x faster | 3.66 ms: 1.76x faster |
Geometric mean | (ref) | 1.02x faster | 1.71x faster | 1.81x faster |
Formatting with safety checks:
Benchmark | interpreted-main | compiled-mypyc-preopt | compiled-mypyc |
---|---|---|---|
fmt-black/__init__ | 1.90 sec | 1.08 sec: 1.75x faster | 993 ms: 1.91x faster |
fmt-black/brackets | 596 ms | 344 ms: 1.73x faster | 323 ms: 1.84x faster |
fmt-black/comments | 475 ms | 283 ms: 1.68x faster | 255 ms: 1.86x faster |
fmt-black/linegen | 1.83 sec | 1.04 sec: 1.75x faster | 994 ms: 1.84x faster |
fmt-black/lines | 1.29 sec | 673 ms: 1.91x faster | 639 ms: 2.01x faster |
fmt-black/mode | 206 ms | 109 ms: 1.90x faster | 99.5 ms: 2.07x faster |
fmt-black/nodes | 1.46 sec | 859 ms: 1.70x faster | 793 ms: 1.84x faster |
fmt-black/output | 192 ms | 109 ms: 1.76x faster | 103 ms: 1.87x faster |
fmt-black/strings | 371 ms | 194 ms: 1.91x faster | 183 ms: 2.03x faster |
fmt-comments | 296 ms | 165 ms: 1.79x faster | 154 ms: 1.92x faster |
fmt-dict-literal | 281 ms | 139 ms: 2.02x faster | 131 ms: 2.14x faster |
fmt-flit/install | 851 ms | 476 ms: 1.79x faster | 447 ms: 1.90x faster |
fmt-flit/sdist | 448 ms | 236 ms: 1.90x faster | 247 ms: 1.81x faster |
fmt-flit_core/config | 1.16 sec | 666 ms: 1.74x faster | 608 ms: 1.90x faster |
fmt-list-literal | 162 ms | 80.4 ms: 2.02x faster | 76.6 ms: 2.12x faster |
fmt-nested | 332 ms | 171 ms: 1.94x faster | 164 ms: 2.03x faster |
fmt-strings-list | 50.7 ms | 30.2 ms: 1.68x faster | 28.4 ms: 1.78x faster |
Geometric mean | (ref) | 1.82x faster | 1.93x faster |
Compiled formatting with ESP on and off:
Benchmark | fmt-fast-esp/compiled-mypyc.json | fmt-fast/compiled-mypyc.json |
---|---|---|
fmt-fast-strings-list | 43.2 ms | 13.1 ms: 3.30x faster |
fmt-fast-nested | 71.9 ms | 64.6 ms: 1.11x faster |
fmt-fast-dict-literal | 60.8 ms | 55.6 ms: 1.09x faster |
fmt-fast-flit_core/config | 313 ms | 287 ms: 1.09x faster |
fmt-fast-flit/sdist | 94.5 ms | 86.8 ms: 1.09x faster |
fmt-fast-list-literal | 36.9 ms | 33.9 ms: 1.09x faster |
fmt-fast-black/strings | 76.5 ms | 72.1 ms: 1.06x faster |
fmt-fast-flit/install | 195 ms | 184 ms: 1.06x faster |
fmt-fast-black/output | 42.8 ms | 40.5 ms: 1.06x faster |
fmt-fast-black/nodes | 342 ms | 326 ms: 1.05x faster |
fmt-fast-black/linegen | 427 ms | 408 ms: 1.05x faster |
fmt-fast-comments | 75.1 ms | 71.9 ms: 1.04x faster |
fmt-fast-black/lines | 301 ms | 290 ms: 1.04x faster |
Geometric mean | (ref) | 1.13x faster |
Benchmark hidden because not significant (4): fmt-fast-black/__init__, fmt-fast-black/mode, fmt-fast-black/brackets, fmt-fast-black/comments