Skip to content

Instantly share code, notes, and snippets.

@Avni2000
Last active September 23, 2025 04:56
Show Gist options
  • Select an option

  • Save Avni2000/c6290cf500b5a87ee22b12ab8daa1dd0 to your computer and use it in GitHub Desktop.

Select an option

Save Avni2000/c6290cf500b5a87ee22b12ab8daa1dd0 to your computer and use it in GitHub Desktop.
GSoC Final Submission

Background

Hi there! I'm Avni Badiwale, a sophomore at UW-Madison majoring in Computer Science and Mathematics. I stumbled into the realm of computational mass spectrometry while looking to get into research at UW, and found myself in such a lab. So for the last ~semester, I gained a decent amount of exposure to the field through working on various projects, and I'd tangentially heard about OpenMS before I knew about GSoC because of it.

Now, I've spent the summer with OpenMS learning a ton about build systems and CI/CD workflows. This document goes into depth about what that really looks like!


Pull Requests (Merged and unmerged):

Arrow Integration into CI + Testing: http://github.com/OpenMS/OpenMS/pull/8091

Background:

  • This was the main purpose of the GSoC project, which was finalized ~ halfway through the coding period!
  • This was going to be the first of two PRs, one to create the structure to be able to build and successfully test Arrow on Ubuntu (install deps with apt) and MacOS (install deps with Brew), and one to write a cmake script to build Arrow+Parquet for Windows (where the precedence was to build and install dependencies from source with cmake). However, plans ended up changing, and the project pivoted towards trying out Conda on windows for the first time, rather than adding to our centralized repository of dependency libs.

Challenges:

  • The workflow for creating the PR in the first place looked something like: commit -> see CI bugs -> make small change to code -> commit ->... which also quickly became quite taxing.
  • On one package manager (apt,brew,conda), Arrow ships its own version of Nlohmann / JSON, a C++ library for managing JSON, and in others it doesn't. There also exist differing versions of Nlohmann / JSON across operating systems. OpenMS also ships its own version of Nlohmann / JSON. This change reconciles all of these factors together!

Simply put, this PR allows an OpenMS developer to use #include and #include freely!

Docker update: http://github.com/OpenMS/OpenMS/pull/8091 Simple change to dockerfile to add Arrow and Parquet via apt

Bioconda update (Closed/Unmerged) OpenMS/bioconda-recipes#14 The simple act of adding Arrow and Parquet to our bioconda build turned out to be far more complicated than it seemed, and another OpenMS team member took that part on.

Offshoot project

My original proposal was more so centered around creating a converter between two file types, .mzML and .parquet than the build system. This didn't end up coming to fruition because the team decided they weren't ready to decide on schema, compression, etc. for the conversion. However, I've made a small pip package for a (fairly simplified) converter specifically because I thought it might be useful regardless! You can find it at https://pypi.org/project/PyquetMS/ or with pip install PyquetMS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment