Skip to content

Instantly share code, notes, and snippets.

@charles-l
Last active January 11, 2021 00:40
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save charles-l/d764ba4b4da9868cc92c400d9f701944 to your computer and use it in GitHub Desktop.
Save charles-l/d764ba4b4da9868cc92c400d9f701944 to your computer and use it in GitHub Desktop.
LLVM GSoC Writeup

GSoC 2017 Project

This summer, I worked on adding more attributes to ThinLTO's index to allow for more optimizations.

Background

LTO (Link-time optimization) combines the bitcode of every module in a project into a monolithic bitcode file during linking, so IPO (interprocedural optimizations) can be applied throughout a whole program. This improves the quality and the speed of the executables generated. However, LTO consumes a large amount of memory, so it isn't viable on many larger C/C++ codebases.

ThinLTO addresses this problem by introducing a light-weight in-memory index, that tracks attributes for relevant global values in a program. The index can be used to propagate information across a whole program, import and internalize functions, and apply whole program optimizations without having to store all the bitcode in memory.

ThinLTO is faster than LTO, consumes less resources, and lends itself to distributed compilation. However, there are still optimizations that ThinLTO can't apply because the required attributes aren't yet in the index. I worked to add more of these attributes for GSoC.

Analysis

First, I had to find which test cases I could use for comparison between ThinLTO and LTO generated executables (using the Benchmarks directory of the test-suite):

(sorted by largest difference between lto and thinlto runtime speeds)

test                                                         lto                    thin
./SciMark2-C/scimark2.test                                   34.173120000000004     37.7289
./ASC_Sequoia/IRSmk/IRSmk.test                               4.29938                6.10558
./ASC_Sequoia/CrystalMk/CrystalMk.test                       4.377740000000001      5.4902
./mafft/pairlocalalign.test                                  17.1773                17.73384
./VersaBench/8b10b/8b10b.test                                3.29366                3.5631600000000008
./ASC_Sequoia/AMGmk/AMGmk.test                               9.06768                9.32828
./Olden/em3d/em3d.test                                       3.8204599999999997     4.03294
./VersaBench/ecbdes/ecbdes.test                              2.10274                2.2632199999999996
./TSVC/LinearDependence-flt/LinearDependence-flt.test        2.1580199999999996     2.25726
./TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl.test      6.81688                6.9129000000000005
./TSVC/StatementReordering-dbl/StatementReordering-dbl.test  2.87866                2.97036
./Fhourstones/fhourstones.test                               0.77992                0.8680399999999999
./TSVC/Expansion-dbl/Expansion-dbl.test                      2.40276                2.48862
./TSVC/Symbolics-flt/Symbolics-flt.test                      0.97872                1.01974
./Bullet/bullet.test                                         4.60032                4.63818
./TSVC/Reductions-flt/Reductions-flt.test                    4.92826                4.963900000000001
./TSVC/LoopRerolling-flt/LoopRerolling-flt.test              2.774                  2.8075200000000002
./Prolangs-C++/life/life.test                                1.08738                1.11896
./TSVC/ControlFlow-dbl/ControlFlow-dbl.test                  4.1200600000000005     4.1431000000000004
./Olden/bh/bh.test                                           1.2872599999999998     1.30836
./Trimaran/netbench-url/netbench-url.test                    2.1750999999999996     2.19426
./MallocBench/espresso/espresso.test                         0.39128                0.40953999999999996
./NPB-serial/is/is.test                                      8.116200000000001      8.132380000000001
./Trimaran/enc-rc4/enc-rc4.test                              0.5858599999999999     0.5984999999999999
./VersaBench/bmm/bmm.test                                    2.35784                2.3693599999999995
./sim/sim.test                                               2.9933                 3.00376
./TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl.test    3.0367                 3.0470800000000002
./Olden/perimeter/perimeter.test                             0.18706                0.19741999999999998
./TSVC/ControlLoops-flt/ControlLoops-flt.test                2.13158                2.1418399999999997
./TSVC/Equivalencing-dbl/Equivalencing-dbl.test              1.80118                1.8109000000000002
./VersaBench/beamformer/beamformer.test                      0.77484                0.7838200000000001
./nbench/nbench.test                                         1.7093400000000003     1.71828
./TSVC/Recurrences-dbl/Recurrences-dbl.test                  3.6061799999999997     3.6148199999999995
./Olden/tsp/tsp.test                                         0.7094                 0.71542
./TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl.test            3.3644399999999997     3.36972
./FreeBench/distray/distray.test                             0.10466                0.10918000000000001
./TSVC/Searching-flt/Searching-flt.test                      2.2469                 2.2511
./Olden/bisort/bisort.test                                   0.47929999999999995    0.48312
./MallocBench/cfrac/cfrac.test                               0.9850200000000001     0.9884600000000001
./MiBench/consumer-typeset/consumer-typeset.test             0.10983999999999998    0.11300000000000002
./Trimaran/enc-3des/enc-3des.test                            1.28424                1.28718
./MiBench/telecomm-CRC32/telecomm-CRC32.test                 0.2762                 0.2786
./llubenchmark/llu.test                                      6.75924                6.76164
./Olden/power/power.test                                     0.71944                0.7218
./mediabench/g721/g721encode/encode.test                     0.02988                0.031979999999999995
./Olden/voronoi/voronoi.test                                 0.23626                0.23801999999999998
./MiBench/network-patricia/network-patricia.test             0.07289999999999999    0.07458000000000001
./FreeBench/fourinarow/fourinarow.test                       0.22271999999999997    0.22432
./TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl.test    3.6113399999999998     3.61276
./MiBench/telecomm-FFT/telecomm-fft.test                     0.03662                0.038040000000000004
./MiBench/consumer-lame/consumer-lame.test                   0.1679                 0.1692
./Olden/health/health.test                                   0.28254                0.28384
./McCat/17-bintr/bintr.test                                  0.0843                 0.0854
./McCat/18-imp/imp.test                                      0.06532000000000002    0.06642
./MallocBench/gs/gs.test                                     0.03152                0.032600000000000004
./Trimaran/netbench-crc/netbench-crc.test                    0.5355599999999999     0.53654
./FreeBench/analyzer/analyzer.test                           0.06734                0.06832
./McCat/04-bisect/bisect.test                                0.09903999999999999    0.09992000000000001
./Olden/mst/mst.test                                         0.07338                0.07405999999999999
./Trimaran/enc-md5/enc-md5.test                              1.26304                1.2636999999999998
./mediabench/mpeg2/mpeg2dec/mpeg2decode.test                 0.0073999999999999995  0.00804
./TSVC/InductionVariable-dbl/InductionVariable-dbl.test      4.00996                4.0106
./Prolangs-C/gnugo/gnugo.test                                0.044199999999999996   0.044700000000000004
./mediabench/adpcm/rawcaudio/rawcaudio.test                  0.00112                0.0015799999999999998
./Prolangs-C/bison/mybison.test                              0.00152                0.00194
./MiBench/consumer-jpeg/consumer-jpeg.test                   0.0031                 0.00346
./Olden/treeadd/treeadd.test                                 0.19097999999999998    0.19129999999999997
./McCat/05-eks/eks.test                                      0.0027999999999999995  0.0030800000000000003
./FreeBench/pifft/pifft.test                                 0.10668                0.10695999999999999
./MiBench/network-dijkstra/network-dijkstra.test             0.03308                0.03332
./Prolangs-C/agrep/agrep.test                                0.00262                0.0028200000000000005
./Prolangs-C++/city/city.test                                0.00494                0.0051400000000000005
./MiBench/security-sha/security-sha.test                     0.012680000000000002   0.012879999999999999
./mediabench/adpcm/rawdaudio/rawdaudio.test                  0.0010999999999999998  0.0012799999999999999
./mediabench/gsm/toast/toast.test                            0.0254                 0.02556
./MiBench/telecomm-gsm/telecomm-gsm.test                     0.21002                0.21017999999999998
./BitBench/drop3/drop3.test                                  0.24165999999999999    0.24180000000000001
./Prolangs-C++/employ/employ.test                            0.00582                0.00592
./BitBench/uuencode/uuencode.test                            0.0157                 0.0157
./TSVC/Packing-flt/Packing-flt.test                          2.59196                2.59192
./McCat/08-main/main.test                                    0.01992                0.019840000000000003
./MiBench/automotive-susan/automotive-susan.test             0.042859999999999995   0.04274000000000001
./BitBench/uudecode/uudecode.test                            0.028559999999999995   0.02838
./Prolangs-C++/simul/simul.test                              0.0061600000000000005  0.005980000000000001
./FreeBench/neural/neural.test                               0.1026                 0.10238
./McCat/03-testtrie/testtrie.test                            0.0077399999999999995  0.007379999999999999
./mediabench/jpeg/jpeg-6a/cjpeg.test                         0.00264                0.00224
./McCat/12-IOtest/iotest.test                                0.12944000000000003    0.12883999999999998
./FreeBench/mason/mason.test                                 0.15511999999999998    0.15431999999999998
./Ptrdist/bc/bc.test                                         0.5263199999999999     0.52546
./McCat/01-qbsort/qbsort.test                                0.053559999999999997   0.0526
./PAQ8p/paq8p.test                                           47.26171999999999      47.260099999999994
./Ptrdist/ft/ft.test                                         1.04444                1.0424399999999998
./Prolangs-C++/ocean/ocean.test                              0.05891999999999999    0.05674
./FreeBench/pcompress2/pcompress2.test                       0.11726                0.11438
./TSVC/Reductions-dbl/Reductions-dbl.test                    2.2299200000000003     2.2268600000000003
./Fhourstones-3.1/fhourstones3.1.test                        1.0269                 1.02342
./McCat/09-vor/vor.test                                      0.09698                0.09332
./TSVC/Expansion-flt/Expansion-flt.test                      1.5393000000000001     1.5355
./Ptrdist/ks/ks.test                                         1.14378                1.1399599999999999
./tramp3d-v4/tramp3d-v4.test                                 0.26248                0.25861999999999996
./TSVC/Packing-dbl/Packing-dbl.test                          2.8963200000000002     2.89234
./Prolangs-C++/primes/primes.test                            0.3233                 0.31919999999999993
./TSVC/ControlLoops-dbl/ControlLoops-dbl.test                2.1750999999999996     2.17072
./MiBench/automotive-bitcount/automotive-bitcount.test       0.06721999999999999    0.062439999999999996
./Ptrdist/anagram/anagram.test                               0.9008800000000001     0.89566
./TSVC/Recurrences-flt/Recurrences-flt.test                  3.6074399999999995     3.6017
./BitBench/five11/five11.test                                2.01152                2.00556
./MiBench/automotive-basicmath/automotive-basicmath.test     0.25332000000000005    0.24714
./TSVC/LoopRestructuring-flt/LoopRestructuring-flt.test      6.46164                6.454420000000001
./TSVC/Searching-dbl/Searching-dbl.test                      2.23234                2.2247999999999997
./TSVC/StatementReordering-flt/StatementReordering-flt.test  2.28326                2.27566
./TSVC/CrossingThresholds-flt/CrossingThresholds-flt.test    2.85818                2.8503200000000004
./Ptrdist/yacr2/yacr2.test                                   0.72272                0.7142000000000002
./TSVC/ControlFlow-flt/ControlFlow-flt.test                  3.46012                3.45094
./7zip/7zip-benchmark.test                                   7.051                  7.03934
./MiBench/security-rijndael/security-rijndael.test           0.0409                 0.029180000000000005
./TSVC/Equivalencing-flt/Equivalencing-flt.test              1.1240999999999999     1.1116
./TSVC/LinearDependence-dbl/LinearDependence-dbl.test        2.8451999999999997     2.82726
./TSVC/IndirectAddressing-flt/IndirectAddressing-flt.test    3.0285                 3.00964
./ASCI_Purple/SMG2000/smg2000.test                           2.45564                2.43576
./TSVC/Symbolics-dbl/Symbolics-dbl.test                      2.04544                2.02226
./TSVC/NodeSplitting-flt/NodeSplitting-flt.test              2.3962200000000005     2.3631800000000003
./TSVC/InductionVariable-flt/InductionVariable-flt.test      3.0465600000000004     3.0042
./VersaBench/dbms/dbms.test                                  1.0751                 1.02504
./DOE-ProxyApps-C/XSBench/XSBench.test                       3.0763399999999996     3.025
./TSVC/LoopRerolling-dbl/LoopRerolling-dbl.test              3.47882                3.39386
./TSVC/NodeSplitting-dbl/NodeSplitting-dbl.test              3.1038                 3.01452
./TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt.test            1.7141000000000002     1.6207
./Trimaran/enc-pc1/enc-pc1.test                              0.5199999999999999     0.21567999999999996

During the first few weeks, I spent most of my time trying to figure out what optimizations were being applied to LTO, but not to ThinLTO. After a few failed attempts (i.e. trying to compare the LTO and ThinLTO generated binaries directly, trying to diff the output of every single optimization pass), I found that the following method worked best:

  • Pick the benchmark with the largest disparity between the LTO and ThinLTO runtime speed (where LTO was faster)
  • Diff the results of -stats (which shows statistics about which passes were able to apply optimizations)
  • Pick an IPO pass to focus on adding to ThinLTO
  • Come up with a simple test case that shows how the pass isn't being fully applied in ThinLTO

Using this method, I found the following potential optimizations that could be added:

Implementation

For the rest of GSoC I worked on implementing the FunctionAttrs pass which propagates information about functions throughout the callgraph (i.e. whether a function recurses, whether it reads or writes memory). This required adding the necessary attributes to the index, so that relevant attributes could be propagated during the thin-link step (which is the point in the process before code is generated, when just the index is available without the bitcode).

This required a bit of preliminary work. Since the index didn't have support for strongly connected component (SCC) navigation, I had to implement GraphTraits for the index. See patch: https://reviews.llvm.org/D36311

The problem with navigating the callgraph directly (without reducing it to a graph of SCCs), is that cycles (that are caused by recursion) are difficult to deal with. An SCC will reduce a cycle into a set of the connected nodes so that the graph can be traversed without worrying about direct or indirect recursion. It simplifies navigation of the whole program's reduced callgraph that's contained in the index to a simple post-order (bottom up) navigation of a DAG (directed acyclic graph).

See below image taken from from http://www.imm.dtu.dk/~hrni/PPA/slides6.pdf

Next, I implemented NoRecurse propagation and finalized another patch for modref information to propagate function attributes.

Future work

I plan to finish adding the FunctionAttrs pass then add the other optimizations listed above. They will require additional attributes to be added to the index including:

  • Function arguments
  • Return types

Other patches

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment