Skip to content

Instantly share code, notes, and snippets.

@nahimilega
Last active August 20, 2021 12:17
Show Gist options
  • Save nahimilega/80d678df491745e0fdb0e4b4e2456430 to your computer and use it in GitHub Desktop.
Save nahimilega/80d678df491745e0fdb0e4b4e2456430 to your computer and use it in GitHub Desktop.
GSoC '21@CERN: RooUnfold - Efficient deconvolution using state of the art algorithms

GSoC'21@CERN: RooUnfold - Efficient deconvolution using state of the art algorithms

What is RooUnfold?

RooUnfold is a powerful modular analysis tool used for the unfolding (AKA "deconvolution" or "un-smearing") of distributions. Unfolding is the statistical process in which the 'true' underlying processes that govern the interactions of fundamental forces of the universe are estimated from both simulated and measured data. RooUnfold's purpose is to provide a framework for testing and using different binned unfolding methods, as far as possible maintaining a consistent view of each method to the user.

About the project

To facilitate these data-intensive analyses, RooUnfold is built on the ROOT Software Framework and is used in the publication of hundreds of LHC analyses every year. RooUnfold was recently upgraded to leverage the power of the ROOT statistical modelling package RooFit in order to provide a statistical comparison of the various algorithms. This project aims to address obsolescence in the underlying RooUnfold implementation that would allow the RooUnfold package to act as a lightweight member class to ROOT whilst retaining its functionality and the ability for users to define new algorithms, including machine learning based solutions and data-intensive tasks.

Work Done

Community Bonding

  1. Meeting the RooFitUnfold team
  2. Learn the concept and math behind the Unfolding process.
  3. Set up the development environment and running my first unfolding :)
  4. Dig into code; understanding the complete architecture of the codebase and mapping the abstract math concepts to the code.
  5. Go through the complete documentation of the library.
  6. Get a CERN account.

Coding: Phase 1

  1. Start with identification of issues
  2. Discuss the approach to familiarise myself with the complete development process and start
  3. Develop bin by bin test
  4. Clean up old tests from the repository
  5. Develop a base test script using RooUnfoldHarness to ease the process of developing new tests with minimal code redundancy. #PR39
  6. Using the base test script to develop the following tests:
    • Test Methods
    • Test Uncertainty
    • Test Overflow
    • Test Fakes
    • Test Closure
    • Test Variable Bin Width
    • Test Python and C++ bindings
    • Test Bin Correlation
    • Test for 2D response Matrix
    • Test for 3D response Matrix
  7. Compute test coverage to validate the new tests and make changes in tests accordingly to maximise coverage.
  8. Merge the new tests into the main repository
  9. Clean up dead and unnecessary code by removing the local copy of TSVDUnfold and Dagostini Bayes.

Coding: Phase 2

  1. Analyse changes in the development repository.
  2. List parameters used by different unfolding algorithms
  3. Design and developing a new class, RooUnfoldAlgorithm. RooUnfoldAlgorithm is a top-class implementation in RooUnfold to provide a standardised interface incorporating useful features like plotting and saving the object making the process of unfolding more intuitive and easier for users along with decoupling the implementation from the interface, allowing to shift to ROOT. This also allows for a common interface for the users, independent on the underlying method of unfolding used.

A complete description of the UML could be found here

Contributions during Google Summer of Code 2021

  1. Developing framework to define new tests with ease
  2. Develop new tests to make the process of development and merging easy and errorfree
  3. Set up changes to compute test code coverage
  4. Design and develop RooUnfoldAlgorithm.

After GSoC: To Do

The software pieces created are far from over and there are a lot of possibilities for extending this work. After spending this super exciting summer, I wish to continue development in RooUnfold after the tenure of GSoC. I wish to complete the implementation of RooUnfoldAlgorithm class and extend its functionality, and more importantly, merge the dev branch into the main repository, allowing for a version 3.0 release. Further, get involved in the process of shifting RooUnfold to ROOT and continue the development of RooUnfold to extend its functionality.

Review

RooUnfold is a powerful tool created to ease the unfolding process and provide a consistent view of the unfolding methods to the user while handling large amounts of data using ROOT. The new class RooUnfoldAlgorithm fulfils the objective of the project of address obsolescence in the underlying RooUnfold by lightweight class using modern C++ that is able to efficiently replicate current usage with improved data visualisation tools and performance metrics provided for the unfolding procedure. Although contrary to prior thoughts and proposal mentions of changing the base class implementation, after getting a deeper understanding, a top-class implementation was more suitable to reduce memory signature, make it user friendly as well as preserving backward compatibility and avoiding god class. Moreover, new test development was given priority contrary to the initially proposed exploration of deep learning-based and unbinned algorithms, as these algorithms currently do not have an immediate demand. In contrast, a new tests suite was needed with immediate effect to make the development cycle much more robust and fast.

In conclusion, the new tests suite would ensure backward compatibility while merging through the changes in the development branch as well as provide a standardised method to implement new tests for new algorithms. Furthermore, the new class RooUnfoldAlgorithm will provide a common interface for the users and is the first milestone in making the RooUnfold package act as a lightweight member class to ROOT.




Personal Reflection

Challenges faced

  1. Being a newbie in C++, it was my first experience with C++ development.
  2. Unfolding was a completely new concept for me. However, my mentors eased up the learning process and helped me with even the most basic of questions.
  3. It is challenging for a new developer to navigate through the RooUnfold codebase and learn to use ROOT.
  4. Library being primarily used for physics analysis, it is tricky to know what exactly is used by scientists and how much.

Intangible Results

  1. Learning about this super interesting statistical process called unfolding.
  2. Got a @cern.ch email address (Dreams do come true, may be temporary but still… completely worth it 🙈)
  3. Got to interact and learn a lot with some fantastic people (my mentors <3) and be a part of this wonderful community.
  4. Made and worked on this super exciting library used by 1000s of high energy particle physics and might one day be used to make a Nobel Prize winning discovery.
  5. Got a unique first-hand experience of C++ development on a big codebase in production.
  6. Most importantly, I got to work on a project in the area of my interest.

Conclusion

It has been an incredible journey and a wonderful experience. I learnt to do things independently and explain the rationale behind them. My favourite part was the independence and responsibility which I had in converting the problem statement to design and ultimately into code. I had tremendous fun working on this project and had excellent support from my mentors all throughout. I want to thank my mentor for giving me this fantastic opportunity. Thanks, Vince, for helping me with all my dumbest of queries regarding unfolding or code or deliverables and constant guidance helping me steer the project in the right direction. Thanks, Tim, for your excellent suggestions and feedback on the design as well as code and for patiently explaining to me even the basic concepts and codebase. Thanks, Pim, for guiding me through new developments. Thanks, Lydia, for your kind words of motivation and your very helpful feedback. Thanks, Carsten, for helping solve even the weirdest make issues in just seconds.

Thanks all for taking out time from your busy schedule and help and guidance on every phase, fruitful discussions in our meetings, prompt replies and kind words and suggestions and a warm attitude :)

Thanks to the CERN-HSF for organising this event and thanks to Google for hosting this program. It is a great help for Open Source initiatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment