Skip to content

Instantly share code, notes, and snippets.

@ichard26
Last active February 5, 2022 20:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ichard26/b996ccf410422b44fcd80fb158e05b0d to your computer and use it in GitHub Desktop.
Save ichard26/b996ccf410422b44fcd80fb158e05b0d to your computer and use it in GitHub Desktop.
Benchmark results from the mypyc integration work for psf/black.

Performance results compiling Black w/ mypyc

Summary

After a lot of work I can confidently say that compiling Black with mypyc will bring excellent performance wins. On average, the time taken to format a file is halved, excluding startup time. Not all files are treated to the same boost though, quite a few files saw even better improvements (up to 2.38x!) and obviously some files didn't see as good improvements.

Breaking down the improvements:

  • Formatting with safety checks: 1.93x faster
  • Formatting without safety checks: 2.11x faster
  • Blib2to3 parsing: 1.81x faster
  • Import (import black): 1.16x faster

And here's the results further broken down:

note: *-dict-literal, *-list-literal, *-comments, *-nested, and *-strings-list are microbenchmarks and therefore don't represent normal real world performance. They aim to measure performance of a certain domain, so for example *-strings-list measures how fast Black can format a long list literal containing lots and lots of strings. Even though they aren't truly representative of real world code, files like them do exist IRL and Black has fixed performance issues for these domains before so they are still relevant.

other notices: these numbers come only from Linux (since it's what I daily drive) so it's totally possible MacOS and/or Windows will see quite different results. Hopefully they aren't in the wrong direction but who knows ... well actually I do plan on collecting numbers on Windows, but I'm impatient to get my results released to the world haha!

formatting with safety checks (fmt)

Faster (17):
- fmt-dict-literal: 281 ms +- 9 ms -> 131 ms +- 6 ms: 2.14x faster
- fmt-list-literal: 162 ms +- 4 ms -> 76.6 ms +- 4.0 ms: 2.12x faster
- fmt-black/mode: 206 ms +- 5 ms -> 99.5 ms +- 6.2 ms: 2.07x faster
- fmt-black/strings: 371 ms +- 8 ms -> 183 ms +- 10 ms: 2.03x faster
- fmt-nested: 332 ms +- 9 ms -> 164 ms +- 9 ms: 2.03x faster
- fmt-black/lines: 1.29 sec +- 0.03 sec -> 639 ms +- 39 ms: 2.01x faster
- fmt-comments: 296 ms +- 6 ms -> 154 ms +- 7 ms: 1.92x faster
- fmt-black/__init__: 1.90 sec +- 0.05 sec -> 993 ms +- 60 ms: 1.91x faster
- fmt-flit/install: 851 ms +- 22 ms -> 447 ms +- 26 ms: 1.90x faster
- fmt-flit_core/config: 1.16 sec +- 0.03 sec -> 608 ms +- 33 ms: 1.90x faster
- fmt-black/output: 192 ms +- 6 ms -> 103 ms +- 6 ms: 1.87x faster
- fmt-black/comments: 475 ms +- 12 ms -> 255 ms +- 14 ms: 1.86x faster
- fmt-black/brackets: 596 ms +- 17 ms -> 323 ms +- 17 ms: 1.84x faster
- fmt-black/nodes: 1.46 sec +- 0.03 sec -> 793 ms +- 48 ms: 1.84x faster
- fmt-black/linegen: 1.83 sec +- 0.05 sec -> 994 ms +- 56 ms: 1.84x faster
- fmt-flit/sdist: 448 ms +- 11 ms -> 247 ms +- 13 ms: 1.81x faster
- fmt-strings-list: 50.7 ms +- 1.7 ms -> 28.4 ms +- 1.5 ms: 1.78x faster

Geometric mean: 1.93x faster

formatting without safety checks (fmt-fast)

Faster (17):
- fmt-fast-dict-literal: 133 ms +- 5 ms -> 55.6 ms +- 3.1 ms: 2.38x faster
- fmt-fast-nested: 152 ms +- 5 ms -> 64.6 ms +- 4.6 ms: 2.35x faster
- fmt-fast-list-literal: 78.7 ms +- 2.3 ms -> 33.9 ms +- 1.7 ms: 2.32x faster
- fmt-fast-flit/install: 422 ms +- 12 ms -> 184 ms +- 12 ms: 2.29x faster
- fmt-fast-black/mode: 94.7 ms +- 2.1 ms -> 42.5 ms +- 2.3 ms: 2.23x faster
- fmt-fast-flit/sdist: 188 ms +- 6 ms -> 86.8 ms +- 5.0 ms: 2.17x faster
- fmt-fast-black/strings: 155 ms +- 4 ms -> 72.1 ms +- 5.5 ms: 2.16x faster
- fmt-fast-black/output: 86.4 ms +- 2.2 ms -> 40.5 ms +- 2.5 ms: 2.13x faster
- fmt-fast-comments: 150 ms +- 4 ms -> 71.9 ms +- 4.6 ms: 2.09x faster
- fmt-fast-black/comments: 197 ms +- 5 ms -> 95.1 ms +- 5.8 ms: 2.07x faster
- fmt-fast-black/brackets: 251 ms +- 7 ms -> 122 ms +- 8 ms: 2.05x faster
- fmt-fast-black/nodes: 659 ms +- 19 ms -> 326 ms +- 21 ms: 2.02x faster
- fmt-fast-black/lines: 582 ms +- 15 ms -> 290 ms +- 15 ms: 2.01x faster
- fmt-fast-black/__init__: 804 ms +- 25 ms -> 408 ms +- 27 ms: 1.97x faster
- fmt-fast-flit_core/config: 561 ms +- 16 ms -> 287 ms +- 16 ms: 1.96x faster
- fmt-fast-black/linegen: 787 ms +- 21 ms -> 408 ms +- 19 ms: 1.93x faster
- fmt-fast-strings-list: 24.5 ms +- 0.9 ms -> 13.1 ms +- 0.7 ms: 1.87x faster

Geometric mean: 2.11x faster

blib2to3 parsing (parse)

Faster (17):
- fmt-dict-literal: 281 ms +- 9 ms -> 131 ms +- 6 ms: 2.14x faster
- fmt-list-literal: 162 ms +- 4 ms -> 76.6 ms +- 4.0 ms: 2.12x faster
- fmt-black/mode: 206 ms +- 5 ms -> 99.5 ms +- 6.2 ms: 2.07x faster
- fmt-black/strings: 371 ms +- 8 ms -> 183 ms +- 10 ms: 2.03x faster
- fmt-nested: 332 ms +- 9 ms -> 164 ms +- 9 ms: 2.03x faster
- fmt-black/lines: 1.29 sec +- 0.03 sec -> 639 ms +- 39 ms: 2.01x faster
- fmt-comments: 296 ms +- 6 ms -> 154 ms +- 7 ms: 1.92x faster
- fmt-black/__init__: 1.90 sec +- 0.05 sec -> 993 ms +- 60 ms: 1.91x faster
- fmt-flit/install: 851 ms +- 22 ms -> 447 ms +- 26 ms: 1.90x faster
- fmt-flit_core/config: 1.16 sec +- 0.03 sec -> 608 ms +- 33 ms: 1.90x faster
- fmt-black/output: 192 ms +- 6 ms -> 103 ms +- 6 ms: 1.87x faster
- fmt-black/comments: 475 ms +- 12 ms -> 255 ms +- 14 ms: 1.86x faster
- fmt-black/brackets: 596 ms +- 17 ms -> 323 ms +- 17 ms: 1.84x faster
- fmt-black/nodes: 1.46 sec +- 0.03 sec -> 793 ms +- 48 ms: 1.84x faster
- fmt-black/linegen: 1.83 sec +- 0.05 sec -> 994 ms +- 56 ms: 1.84x faster
- fmt-flit/sdist: 448 ms +- 11 ms -> 247 ms +- 13 ms: 1.81x faster
- fmt-strings-list: 50.7 ms +- 1.7 ms -> 28.4 ms +- 1.5 ms: 1.78x faster

Geometric mean: 1.93x faster

Noteworthy things

Sadly, these wins won't guarantee Black feels twice as fast since the Python startup and import time of Black has a costly overhead. When formatting one small to somewhat large file, startup can easily account for over 50% of the time needed between the command being executed and being finished. On the bright side, mypyc does in fact bring a small reduction to import time. It's not as impressive as the rest of the improvements at only 1.16x but it will be noticable. On my machine, 1.16x translates to 40ms which is awesome! Also blackd is the actual solution to avoiding the startup penalty, not mypyc 😉

Additionally, the optimizations made weren't totally mypyc-only surprisingly enough! Now before you get excited, it ranges from zero to two percent ._. Yeah ... not great, but better than nothing I suppose ¯_(ツ)_/¯

Oh and I can confirm that experimental string processing does in fact have a pretty substantial performance cost. Turning on ESP costs you on average 13.5% in fmt-fast. It can even get worse if your code has more string literals. Strings-list microbenchmarks saw slowdowns over 3x. Though this is a known issue.

On top of everything else, these numbers were collected using wheels compiled via a GitHub Actions workflow. This is important to note because the performance gains are actually lower with these wheels compared to wheels built locally. This makes sense since an older C compiler wouldn't be able to optimize as well as newer ones. The manylinux docker images (used to built highly compatible wheels) come with a pretty old version of gcc and most up to date clang supported was also pretty ancient. Thankfully it wasn't too bad here, at most it costed me ~7% as far as I can tell.

Finally, if you were wondering if those optimizations I made actually made a difference, I can say that yes they did! Locally I saw improvements of ~16%, but the GitHub Actions wheels only saw 10%. Goes to show you the difference compiler versions can make.

Boring "and so my data looks credible" stuff

To collect my numbers I used blackbench (except for import time, for that I just used pyperf) - which is a benchmarking tool I made specifically for Black. I won't go into detail on what it is since that's not the point here, but think of it as like pyperformance but for Black :)

Further more, I made sure to tune my system to avoid as much system noise in the data as possible. I've actually documented my steps but the TR;DR is CPU core isolation, tweaks to the Linux scheduler, frequency pinning, and other stuff was used.

What follows is some metadata just in case that matters:

Some software:

Tested revisions:

System metadata:

Metadata:
- aslr: Full randomization
- boot_time: 2021-08-16 17:58:09
- cpu_config: 0-1=driver:acpi-cpufreq, governor:ondemand; idle:acpi_idle
- cpu_count: 2
- cpu_freq: 0=1300 MHz; 1=1585 MHz
- cpu_model_name: AMD A6-9220 RADEON R4, 5 COMPUTE CORES 2C+3G
- date: 2021-08-16 18:59:58.400971
- hostname: acer-ubuntu
- load_avg_1min: 2.30
- mem_max_rss: 13.5 MB
- perf_version: 2.2.0
- platform: Linux-5.11.0-25-generic-x86_64-with-glibc2.29
- python_cflags: -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall
- python_compiler: GCC 9.3.0
- python_executable: /home/ichard26/programming/oss/black/bm-venv/bin/python
- python_implementation: cpython
- python_version: 3.8.5 (64-bit)
- runnable_threads: 1
- timer: clock_gettime(CLOCK_MONOTONIC), resolution: 1.00 ns
- uptime: 1 hour 1 min 49.5 sec
(bm-venv) ichard26@acer-ubuntu:~/programming/oss/black$ neofetch --ascii
            .-/+oossssoo+/-.               ichard26@acer-ubuntu 
        `:+ssssssssssssssssss+:`           -------------------- 
      -+ssssssssssssssssssyyssss+-         OS: Ubuntu 20.04.2 LTS x86_64 
    .ossssssssssssssssssdMMMNysssso.       Host: Aspire A315-21 V1.03 
   /ssssssssssshdmmNNmmyNMMMMhssssss/      Kernel: 5.11.0-25-generic 
  +ssssssssshmydMMMMMMMNddddyssssssss+     Uptime: 1 hour, 3 mins 
 /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/    Packages: 2449 (dpkg), 10 (snap) 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Shell: fish 3.1.0 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   Resolution: 1366x768 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   DE: GNOME 
ossyNMMMNyMMhsssssssssssssshmmmhssssssso   WM: Mutter 
+sssshhhyNMMNyssssssssssssyNMMMysssssss+   WM Theme: Adwaita 
.ssssssssdMMMNhsssssssssshNMMMdssssssss.   Theme: Yaru-light [GTK2/3] 
 /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/    Icons: Yaru [GTK2/3] 
  +sssssssssdmydMMMMMMMMddddyssssssss+     Terminal: gnome-terminal 
   /ssssssssssshdmNNNNmyNMMMMhssssss/      CPU: AMD A6-9220 RADEON R4 2C+3G (2) @ 2.500GHz 
    .ossssssssssssssssssdMMMNysssso.       GPU: AMD ATI Radeon R2/R3/R4/R5 Graphics 
      -+sssssssssssssssssyyyssss+-         Memory: 1867MiB / 11432MiB 
        `:+ssssssssssssssssss+:`
            .-/+oossssoo+/-.

Finally, here's a ZIP file containing all of the data I collected: https://github.com/psf/black/files/6995838/mypyc-data.zip. Go have fun exploring it yourself with pyperf!


Appendix: some cool tables

Parsing:

Benchmark interpreted-main interpreted-mypyc compiled-mypyc-preopt compiled-mypyc
parse-black/__init__ 375 ms 364 ms: 1.03x faster 262 ms: 1.44x faster 245 ms: 1.53x faster
parse-black/brackets 116 ms not significant 69.4 ms: 1.67x faster 60.7 ms: 1.91x faster
parse-black/comments 109 ms not significant 53.4 ms: 2.04x faster 57.2 ms: 1.90x faster
parse-black/linegen 377 ms 369 ms: 1.02x faster 265 ms: 1.43x faster 237 ms: 1.59x faster
parse-black/lines 286 ms 278 ms: 1.03x faster 144 ms: 1.98x faster 137 ms: 2.08x faster
parse-black/mode 39.1 ms 38.5 ms: 1.02x faster 25.1 ms: 1.56x faster 23.8 ms: 1.64x faster
parse-black/nodes 318 ms not significant 202 ms: 1.57x faster 190 ms: 1.68x faster
parse-black/output 42.5 ms 41.6 ms: 1.02x faster 25.1 ms: 1.69x faster 25.5 ms: 1.67x faster
parse-black/strings 70.3 ms not significant 42.3 ms: 1.66x faster 43.7 ms: 1.61x faster
parse-comments 68.6 ms not significant 39.3 ms: 1.75x faster 36.5 ms: 1.88x faster
parse-dict-literal 47.5 ms 46.8 ms: 1.01x faster 27.4 ms: 1.73x faster 24.9 ms: 1.91x faster
parse-flit/install 215 ms not significant 106 ms: 2.03x faster 111 ms: 1.93x faster
parse-flit/sdist 105 ms not significant 58.2 ms: 1.80x faster 53.9 ms: 1.95x faster
parse-flit_core/config 275 ms not significant 145 ms: 1.89x faster 126 ms: 2.18x faster
parse-list-literal 26.3 ms not significant 15.5 ms: 1.69x faster 14.5 ms: 1.82x faster
parse-nested 59.1 ms not significant 34.2 ms: 1.73x faster 30.6 ms: 1.93x faster
parse-strings-list 6.45 ms 6.22 ms: 1.04x faster 3.94 ms: 1.64x faster 3.66 ms: 1.76x faster
Geometric mean (ref) 1.02x faster 1.71x faster 1.81x faster

Formatting without safety checks:

Benchmark interpreted-main interpreted-mypyc compiled-mypyc-preopt compiled-mypyc
parse-black/__init__ 375 ms 364 ms: 1.03x faster 262 ms: 1.44x faster 245 ms: 1.53x faster
parse-black/brackets 116 ms not significant 69.4 ms: 1.67x faster 60.7 ms: 1.91x faster
parse-black/comments 109 ms not significant 53.4 ms: 2.04x faster 57.2 ms: 1.90x faster
parse-black/linegen 377 ms 369 ms: 1.02x faster 265 ms: 1.43x faster 237 ms: 1.59x faster
parse-black/lines 286 ms 278 ms: 1.03x faster 144 ms: 1.98x faster 137 ms: 2.08x faster
parse-black/mode 39.1 ms 38.5 ms: 1.02x faster 25.1 ms: 1.56x faster 23.8 ms: 1.64x faster
parse-black/nodes 318 ms not significant 202 ms: 1.57x faster 190 ms: 1.68x faster
parse-black/output 42.5 ms 41.6 ms: 1.02x faster 25.1 ms: 1.69x faster 25.5 ms: 1.67x faster
parse-black/strings 70.3 ms not significant 42.3 ms: 1.66x faster 43.7 ms: 1.61x faster
parse-comments 68.6 ms not significant 39.3 ms: 1.75x faster 36.5 ms: 1.88x faster
parse-dict-literal 47.5 ms 46.8 ms: 1.01x faster 27.4 ms: 1.73x faster 24.9 ms: 1.91x faster
parse-flit/install 215 ms not significant 106 ms: 2.03x faster 111 ms: 1.93x faster
parse-flit/sdist 105 ms not significant 58.2 ms: 1.80x faster 53.9 ms: 1.95x faster
parse-flit_core/config 275 ms not significant 145 ms: 1.89x faster 126 ms: 2.18x faster
parse-list-literal 26.3 ms not significant 15.5 ms: 1.69x faster 14.5 ms: 1.82x faster
parse-nested 59.1 ms not significant 34.2 ms: 1.73x faster 30.6 ms: 1.93x faster
parse-strings-list 6.45 ms 6.22 ms: 1.04x faster 3.94 ms: 1.64x faster 3.66 ms: 1.76x faster
Geometric mean (ref) 1.02x faster 1.71x faster 1.81x faster

Formatting with safety checks:

Benchmark interpreted-main compiled-mypyc-preopt compiled-mypyc
fmt-black/__init__ 1.90 sec 1.08 sec: 1.75x faster 993 ms: 1.91x faster
fmt-black/brackets 596 ms 344 ms: 1.73x faster 323 ms: 1.84x faster
fmt-black/comments 475 ms 283 ms: 1.68x faster 255 ms: 1.86x faster
fmt-black/linegen 1.83 sec 1.04 sec: 1.75x faster 994 ms: 1.84x faster
fmt-black/lines 1.29 sec 673 ms: 1.91x faster 639 ms: 2.01x faster
fmt-black/mode 206 ms 109 ms: 1.90x faster 99.5 ms: 2.07x faster
fmt-black/nodes 1.46 sec 859 ms: 1.70x faster 793 ms: 1.84x faster
fmt-black/output 192 ms 109 ms: 1.76x faster 103 ms: 1.87x faster
fmt-black/strings 371 ms 194 ms: 1.91x faster 183 ms: 2.03x faster
fmt-comments 296 ms 165 ms: 1.79x faster 154 ms: 1.92x faster
fmt-dict-literal 281 ms 139 ms: 2.02x faster 131 ms: 2.14x faster
fmt-flit/install 851 ms 476 ms: 1.79x faster 447 ms: 1.90x faster
fmt-flit/sdist 448 ms 236 ms: 1.90x faster 247 ms: 1.81x faster
fmt-flit_core/config 1.16 sec 666 ms: 1.74x faster 608 ms: 1.90x faster
fmt-list-literal 162 ms 80.4 ms: 2.02x faster 76.6 ms: 2.12x faster
fmt-nested 332 ms 171 ms: 1.94x faster 164 ms: 2.03x faster
fmt-strings-list 50.7 ms 30.2 ms: 1.68x faster 28.4 ms: 1.78x faster
Geometric mean (ref) 1.82x faster 1.93x faster

Compiled formatting with ESP on and off:

Benchmark fmt-fast-esp/compiled-mypyc.json fmt-fast/compiled-mypyc.json
fmt-fast-strings-list 43.2 ms 13.1 ms: 3.30x faster
fmt-fast-nested 71.9 ms 64.6 ms: 1.11x faster
fmt-fast-dict-literal 60.8 ms 55.6 ms: 1.09x faster
fmt-fast-flit_core/config 313 ms 287 ms: 1.09x faster
fmt-fast-flit/sdist 94.5 ms 86.8 ms: 1.09x faster
fmt-fast-list-literal 36.9 ms 33.9 ms: 1.09x faster
fmt-fast-black/strings 76.5 ms 72.1 ms: 1.06x faster
fmt-fast-flit/install 195 ms 184 ms: 1.06x faster
fmt-fast-black/output 42.8 ms 40.5 ms: 1.06x faster
fmt-fast-black/nodes 342 ms 326 ms: 1.05x faster
fmt-fast-black/linegen 427 ms 408 ms: 1.05x faster
fmt-fast-comments 75.1 ms 71.9 ms: 1.04x faster
fmt-fast-black/lines 301 ms 290 ms: 1.04x faster
Geometric mean (ref) 1.13x faster

Benchmark hidden because not significant (4): fmt-fast-black/__init__, fmt-fast-black/mode, fmt-fast-black/brackets, fmt-fast-black/comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment