Improve MegreFunctions to incorporate MergeSimilarFunctions patches and ThinLTO Support (GSOC'20)
This project aims to improve the MergeFunctions pass of LLVM to have feature parity with MergeSimilarFunctions, a pass which can merge not just identical functions but functions which are similar. This results in a reduction of the code size. Further incorporate MergeFunctions with support for ThinLTO, a link-time optimization. This work will benefit areas in which code size is of vital importance, say embedded systems. The problem of future divergences of IR which may result in mis-compiles, as the comparators not being updated is also tackled.
Aditya Kumar, JF Bastien
MergeFunctions vs MergeSimilarFunctions
MergeFunctions is a pass which merges exactly similar functions, where as MergeSimilarFunctions can merge functions which are not exact replicas but are similar. Two functions with differences are merged into a single function and augmented with the appropriate control flow to handle the differences . The level of similarity needed to merge can be user defined. The figure below shows how two functions different yet similar, can be merged by attaching an additional argument to account for the function variant being called and adding phi instructions to get the required result.
Current status of MergeFunctions
The current MergeFunctions approach uses a logarithamic search for comparing functions which have the same hash. Currently MergeFunctions uses FunctionComparator to compare functions. This FunctionComparator class compares IR by a method cmpOperations which has equality checks for non-trivial IR types. For any change in the IR, this method needs to be updated too. Failing to do so, can lead to subtle mis-compiles which are hard to detect. If a new instruction is added or a new operand is added to an existing instruction, the comparators if not updated may treat instructions equal, which are actually unequal leading to mis-compiles. https://reviews.llvm.org/D79261 is one such mis-compile caused by not updating the FunctionComparator when ShuffleVector class has been changed.
- The problem of IR divergence is that IR is defined completely elsewhere from where comparators are defined. When there is a change to IR, the comparators need to be updated. We tried solving this problem in two approaches:
TableGen based approach : TableGen is used to maintain records and factor out common information about these records, especially when the number of records is large. There by, reducing the amount of duplication and chance of error. We tried integrating the IR Comparators into tablegen, so that changes in the IR can directly reflect in the comparators. But, integrating all of the IR into tablegen might reduce the amount of flexibility for future IR changes. We tried building a seperate tablgen file which handles the IR comparators and build a new tablegen backend --gen-ir-cmp, which generates the C++ code for the comparators. For an operand comparison of an instruction, the type(numbers, Orderings, Attributes, etc) in which that operand should be compared and the get method of that operand is needed. By creating entries for all instructions and returning false by default, this approach can solve the problem. But, it doesn't provide much advantage as for any changes in the IR, the change has to be reflected in the tablegen file manually for the comparators to stay in sync. https://reviews.llvm.org/D82039 is a patch with this approach.
The Comparator method approach : The ideal thing is that when IR is defined, the author should also say "here's how you can compare two IR instructions of this type". This can be done by defining a method in that subclass of IR saying how total ordering can be introduced between two such instructions and a method in "Instruction" class which makes use of all these comparator methods and compare two instructions of any class.
- A protected method compareInstSpecificProperties() has been defined in each subclass of the IR which says how to compare two instructions of that subclass.
- IgnoreAlignment and IgnoreMetaData have also been added as parameters where they are relavent, which say whether to consider/ignore Alignment and MetaData during comparison.
- The method haveSameSpecialState() is converted to compareSpecialState() which compares two Instructions of any subclass and takes the parameters IgnoreAlignment and IgnoreMetaData as flags. This method introduces total ordering between two instructions which have the same Opcode and same number of Operands.
- When a new instruction is added, but compareSpecialState() is not updated, it asserts to false by default and reports comparison of unknown instructions.
- When an operand is added to an existing instruction, there will be a need to do a lot of changes in the subclass of that IR instruction like changing the constructors, adding get/set methods for the new operand. Along with these changes, an additional check need to be added in the compareInstSpecificProperties() method of that same subclass which will probably be in the same file.
- This minimizes the number of files to touch while maintaining sync of IR with the comparators.
- Testcases for each IR type have been added to test whether the comparison is legitimate.
- A limitation of this approach is that there will not be any error reported when an operand is added, but the compareInstSpecificProperties() of that class has not been updated. In which case, the comparators will not consider the new operand during comparison which may lead to mis-compiles. But, this can be easily verified as for the changes in the IR, the changes to be made are in the same file and the comparators use the centralized method to delegate the comparison to the IR subclass.
In view of the above advantages and disadvantages, we decided to use the Comparator method approach. So, we defined methods in each of the subclass of IR which specifies how to order two IR instructions of that type. We also modified the haveSameSpecialState method in Instruction class to a switch statement over the Opcode from an if-else condition. We then tried making MergeFunctions use the modified method and the result was successful. All the MergeFunc test cases were passing and we ran the llvm-testsuite and there are no mis-compiles. To detect any future divergences, we added test cases for every known IR type. For non-trivial instructions, two test cases were added, one in which the instructions should and the other in which the instructions should not be merged. For trivial instructions, one test case was added in which the instruction should be merged.
- The MergeSimilarFunctions pass in https://reviews.llvm.org/D52896 has been integrated into MergeFunctions pass. The additions/changes are the following:
- The MergeFunctions pass now needs a parameter which decides whether to do only precise merging(merge exact replicates of functions) or partial merging(merge functions which are similar) too.
- A new class has been added in FunctionComparator which is PartialFunctionComparator which compares functions and stores the differing instructions and also records the degree of similarity between functions and the old FunctionComparator class has been renamed as PreciseFunctionComparator.
- The MergeFunctions pass file now contains two classes MergeFunctions and MergeSimilarFunctions. The MergeFunctions class uses the PreciseFunctionComparator which introduces total ordering between functions as it uses a logarithmical search for looking up similar functions. The MergeSimilarFunctions class uses the PartialFunctionComparator which compares functions and records the degree of similarity between them and the pass combines functions exceeding a similarity threshold. All functions which hash to the same bucket are compared.
In the patches, MergeFunctions is being scheduled twice. The first one which does PreciseMerging and in the second PartialMerging is being done. These passes are scheduled right next to each other in the optimization pipeline. This can be improved by changing the order. MergeFunctions can be scheduled to do PreciseMerging in the early stage, to reduce the amount of optimization work which needs to be done and thereby improving the compile time. And towards the end of the pipeline after the optimization work on every function is done, MergeFunctions can do PartialMerging to reduce the code size.
Running MergeFunctions with MergeSimilarFunctions Enabled
Currently, the option to whether or not to merge similar functions can be given as a command-line argument with LLVM opt tool.
To do only precise merging(merging of exact replicas):
opt -mergefunc [options] [filename]
To enable partial merging(merging of similar functions):
opt -mergefunc -mergesimilarfunc [options] [filename]
Links to patches
- https://reviews.llvm.org/D82892 ([NFC] Methods to compare IR added in each IR subclass)
- https://reviews.llvm.org/D85616 (Improved MergeFunctions to merge similar functions)
- https://reviews.llvm.org/D82039 (Integrated cmpOperations() of FunctionComparator into tblgen)
LLVM, MySQL and Qemu are selected as benchmarks. The compilation time and build sizes of all the three benchmarks have been noted with the following apparatus.
Compiling with Apple Clang 10.0.1 vs Clang 11.0.0 https://docs.google.com/document/d/1laE3ylhq3S0XVE3_c3Pehuyw6erUACYsI9wUnsAqAEE/edit?usp=sharing
Compiling with Clang 11.0.0 by switching MergeFunctions pass on/off(Text segment size of the binary is also noted) https://docs.google.com/spreadsheets/d/1FwpzB0n9SlM47b0NMPzq6CEQfkSb7OpXvFX778vBsfo/edit?usp=sharing
To show the extent to which MergeFunctions can be useful, quoting the top 10 results of MySQL and Qemu from the experiment of compiling with Clang 11.0.0 by switching MergeFunctions on and off(baseline). The text segment sizes are being reported in the tables below.
|Filename||Baseline||With MergeFunc||Size reduction%|
|load_data_events.cpp.o (ex libbinlogstandalone.a)||3,459||3,171||8.3%|
|rows_event.cpp.o (ex libbinlogstandalone.a)||18,264||16,756||8.2%|
|control_events.cpp.o (ex libbinlogstandalone.a)||13,820||13,018||5.8%|
|iterator.cpp.o (ex libbinlogevents.a)||3,775||3,587||4.9%|
|binlog_event.cpp.o (ex libbinlogstandalone.a)||2,525||2,430||3.7%|
|rows_event.cpp.o (ex libbinlogevents.a)||15,442||14,888||3.5%|
|Filename||Baseline||With MergeFunc||Size reduction%|
|stream.o (ex ./slirp/libslirp.a)||1,252||1,077||13.9%|
Discussions on mailing lists
Due to various reasons, the patches are still under review. The comments on the patches are being actively addressed. The improved MergeFunctions patch can merge potentially similar functions. Also, the benchmarks need to be recorded while running MergeFunctions pass enabling and disabling the similar functions switch. There is a need to try various other benchmarks like Chrome/Webkit and check there are no mis-compiles being generated when running LLVM with MergeSimilarFunctions switched on.
- Tobias J.K. Edler von Koch, Björn Franke, Pranav Bhandarkar and Anshuman Dasgupta. Exploiting Function Similarity for Code Size Reduction. LCTES '14: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems June 2014 Pages 85–94