mosser/llvm.md

## llvm.md

      
    Raw
  

              llvm.md
            
          
    Identifying Optimisation Conflicts in LLVM

Supervision

This internship proposal is a joint collaboration between Université du Québec à Montréal (UQAM) and the Laboratoire d'Informatique du Paralélisme (LIP) located in Lyon.
It targets MSc students interested in software engineering (CA) and compilation (FR).

Lyon: Laure Gonnord & Matthieu Moy (CASH Inria team)
Montréal: Sébastien Mosser & Benjamin Benni (ACE group)

Context

The LLVM project is a compiler infrastructure, defining modular and reusable compiler artefacts. In this context, we will particularly focus on the optimisation passes defined in the LLVM implementation.
An optimisation pass can be modelled as a function p that takes as input an LLVM intermediate representation (IR) of the program and produces as output an optimised one: p: IR -> IR. Thus, chaining passes means to rely on classical function composition: p○p'(ir) = p(p'(ir)).
Objective

This internship addresses the identification of conflicts between passes. Proving in an absolute way the absence of overlapping among passes is a difficult problem, that will not scale the number of passes available in LLVM. Related work focus on reducing the search space to compute an Optimized Sequence of Optimization (OSO), considering non-functional metrics such as execution time [1].
However, similarly to our recent work on rewriting rules interactions [2], we would like to provide an empirical analysis of the different passes, from a functional point of view, i.e., measuring the functional impact of permuting two passes. Considering that it is too difficult to prove that the optimization passes o and o' commutes (respectively interacts) in the general case, we want to measure if they commute when applied to a specific program p.
By applying this approach to a reference corpus of programs, we will gather evidences to then refine the initial results and target specific optimisation passes, i.e., the one that have the most impact.
Expected work

The first task is to understand the LLVM toolchain, and how the optimiser works with the different passes, which can be analysis or rewriting ones. Then, a distance metrics will have to be defined to measure the difference between two intermediate representations.
Let o1 and o2 two optimisation passes, and p a given program.

Let p12 = o1(o2(p)), and p21 = o2(o1(p))
if distance(p12, p21) = 0, then o1 and o2 are said to commute on p.
If there is a distance between p12 and p21, then the passes are said to interact on p.

The second task is to setup an empirical benchmark that will apply different optimization sequences on a reference corpus, and carefully gather the commutativity / conflict results with respect to the programs under study and the executed passes.
Then, based on these experimental results, we will measure the impact of an interaction. According to the state of the art, this impact can be measured using non functional metrics, e.g., execution time. However, considering that the source of the passes is available in LLVM, we want to consider a given pass as a white box and instrument its code to gather metrics about its usage, at a fine grained level. For example, an analysis pass will enrich the intermediate representation with meta-data that are reused by a rewriting one. Measuring the number of time a rewriting phase access to theses metadata will help to finely measure the impact of a given pass to the other one.
References


"A Study of Conflicting Pairs of Compiler Optimizations", Yosi Ben Asher, Gadi Haber, Esti Stein. 11th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoc), 2017. Link
"A Delta-oriented Approach to Support the Safe Reuse of Black-box Code Rewriters". Benjamin Benni, Sébastien Mosser, Naouel Moha, Michel Riveill. International Conference on software Reuse (ICSR), 2018. Link
"LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation". Chris Lattner and Vikram Adve. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization (CGO), 2004. Link