Hi there! I'm Avni Badiwale, a sophomore at UW-Madison majoring in Computer Science and Mathematics. I stumbled into the realm of computational mass spectrometry while looking to get into research at UW, and found myself in such a lab. So for the last ~semester, I gained a decent amount of exposure to the field through working on various projects, and I'd tangentially heard about OpenMS before I knew about GSoC because of it.
Now, I've spent the summer with OpenMS learning a ton about build systems and CI/CD workflows. This document goes into depth about what that really looks like!
Arrow Integration into CI + Testing: http://github.com/OpenMS/OpenMS/pull/8091
Background:
- This was the main purpose of the GSoC project, which was finalized ~ halfway through the coding period!
- This was going to be the first of two PRs, one to create the structure to be able to build and successfully test Arrow on Ubuntu (install deps with apt) and MacOS (install deps with Brew), and one to write a cmake script to build Arrow+Parquet for Windows (where the precedence was to build and install dependencies from source with cmake). However, plans ended up changing, and the project pivoted towards trying out Conda on windows for the first time, rather than adding to our centralized repository of dependency libs.
Challenges:
- The workflow for creating the PR in the first place looked something like: commit -> see CI bugs -> make small change to code -> commit ->... which also quickly became quite taxing.
- On one package manager (apt,brew,conda), Arrow ships its own version of Nlohmann / JSON, a C++ library for managing JSON, and in others it doesn't. There also exist differing versions of Nlohmann / JSON across operating systems. OpenMS also ships its own version of Nlohmann / JSON. This change reconciles all of these factors together!
Simply put, this PR allows an OpenMS developer to use #include and #include freely!
Docker update: http://github.com/OpenMS/OpenMS/pull/8091 Simple change to dockerfile to add Arrow and Parquet via apt
Bioconda update (Closed/Unmerged) OpenMS/bioconda-recipes#14 The simple act of adding Arrow and Parquet to our bioconda build turned out to be far more complicated than it seemed, and another OpenMS team member took that part on.
My original proposal was more so centered around creating a converter between two file types, .mzML and .parquet than the build system. This didn't end up coming to fruition because the team decided they weren't ready to decide on schema, compression, etc. for the conversion. However, I've made a small pip package for a (fairly simplified) converter specifically because I thought it might be useful regardless! You can find it at https://pypi.org/project/PyquetMS/ or with pip install PyquetMS.