xgupta/GSoC23-report.md

## GSoC23-report.md

      
    Raw
  

              GSoC23-report.md
            
          
Project Abstract

The primary aim of this Google Summer of Code project is to provide patch-based test coverage for quick test feedback. Most of the functionality in llvm is tested using the regression tests which are executed by llvm-lit. The RUN commands are written using FileCheck utility which tells lit how to execute the test case containing the specified binary slong with command line options.
For a project as critical as compiler it is important that all code must be thoroughly tested. The code in the patches that merge into llvm-project should be properly tested i.e. they must exercised by any of the test cases added in the patch review. Here Code coverage feature comes to the rescue. It is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. In our case regression tests written in llvm lit format which do not directly executed by binray instead by llvm-lit.
The high-level overview of the project is visible here:
Patch-based test coverage for quick test feedback
Specific details of each pull contribution are described below.
What has been done

The first stage of the project was to add an option to llvm-lit to emit the necessary test coverage data, divided per test case (involves setting a unique value to LLVM_PROFILE_FILE for each RUN).
We have first defined a new variable LLVM_TEST_COVERAGE which when set, passes newly introduced --per-test-coverage option to
llvm-lit which will set a unique value to LLVM_PROFILE_FILE for each of the test case's RUN line. So for example
coverage data for the test case llvm/test/Analysis/AliasSet/memtransfer.ll will be emitted in
build/test/Analysis/AliasSet/memtransfer0.profraw which be later available for processing with our new tool.
Code Review - https://reviews.llvm.org/D154280
Commit - https://github.com/llvm/llvm-project/commit/64d19542e78a43edb7ae26ea6762a2b1c360a916
After generating the coverage data of each test case of the patch, the next step was to process that data. For this, we have introduced a
new tool that processes the generated coverage data and the relevant git patch, and presents the results in a user-friendly manner.
It will show the user which source code line from the patch is covered and which is not.
Code Review (In progress) - https://reviews.llvm.org/D158864
The next step is to integrate this tool into the CI so that reviewers can access the coverage result. We need to setup a separate GitHub Action for generating coverage of the patch. This makes it actually used by llvm-project to improve its code review process.
Code Review (In Progress) - https://reviews.llvm.org/D159007
What to do next

There is still a lot more improvement to be made.
We want to lower the overhead of generating and processing coverage of source lines not relevant to the patch. So that processing time will be reduced multiple times. I believe there are some changes in the build system and the way the code coverage is emiitted for the instrumented file, are requires to achieve that. We will be exploring that next.
What I have learned

Stepping my foot into the open-source community, I realized the range of possibilities that I can work on and how there are endless projects to be discovered. I got to know how amazing is code coverage feature and how much time it saves reviewers and developers.
I am now confident about my skills, At the end of GSoC I felt I could also work on a large project, break down the scope into manageable tasks, can do the mapping between the project and the idea, and the implementation.
On the technical side, I learned how to manage large projects in Python, as it was my first time working with a project of this caliber. At first, even setting up the project was troubling, conditions for building the project, and checking how to generate and understand code coverage even for small projects. Then I learned how to utilize the standard Python library and communicate the data effectively, to store and access data most efficiently.
Acknowledgements

From building the project environment to actually seeing the coverage report of a patch, every step of working on this project has been pleasant and fulfilling. I believe there is a lot to explore and update in the GSoC project and also in the overall code coverage infrastructure. I wish to continue to contribute to it.
I want to thank my mentor Henrik G Olsson (hnrklssn) for helping me pathfind through this project, where I could have gotten confused or lost. I truly appreciate your help and hope to contribute to LLVM in the future as well.