Skip to content

Instantly share code, notes, and snippets.

@24sharkS
Last active August 10, 2021 01:51
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save 24sharkS/7cb791091a8301f7c8460f15c04b97a0 to your computer and use it in GitHub Desktop.
Save 24sharkS/7cb791091a8301f7c8460f15c04b97a0 to your computer and use it in GitHub Desktop.
GSOC 2020 Final Report

OpenMS R Package

Introduction

OpenMS, an open-source C++ library for the analysis of mass spectrometric data exposes a large part of its functionality to the Python community via the automatically generated pyOpenMS package.In order to make the OpenMS algorithms available in R, we can use the reticulate package which is an R interface to python, to call the pyopenms methods. This project involved expanding this initial prototype to ccreate an R package to expose the classes and methods in pyopenms.

It is one of the selected projects under Open Bioinformatics Foundation for GSoC 2020. It was executed under the mentorship of Timo Sachsenberg, Hannes Röst and Oliver Alka.

The GitHub project management dashboard for this project can be found here.

I also created blog posts to describe the implementation details and post updates, which are available here

Prototype Version of Package

I started by creating a minimal package including some of the pyOpenMS classes to showcase that these classes could be easily wrapped in R as an R6 class. The reason for choosing R6 classses as wrappers was firstly that they follow reference semantics which will allow us to change the state of the encapsulated python object and secondly that R6 classes are faster and memory efficient alternative to R's built-in reference classes which are also based on the same principle. In this package, automatic type conversion (R data type to python and vice versa) was also tested for simple but important functions like get_peaks() and set_peaks() which expect numpy arrays as input and it was concluded that in our case, reticulate handles conversion of R data types well and explicit conversion is necessary only if the argument passed to a function is getting modified or if the function expects a set/dictionary/nested data structure having any class to wrap.

The commit history related to above can be found here. Blog related to above can be found here

Wrappers for autowrap test file.

Since, pyOpenMS is a vast library with many classes, we decided to automate the generation of these wrappers as otherwise it would be too much of a tedious task to write these wrappers manually. We decided to modify autowrap, a python library to automate cython code generation using declarations from pxd files, in order to generate R6 wrapper classes. First the plan was to test if these wrappers could be created in a way compatible to workflow of autowrap. For this purpose, I wrote a script containing wrapper code for libcpp_test.py which is one amongst the test files used by autowrap.

Commits The commit history for above is included here

Important Commits:

  • [1896b4]: Wrapper for Int Class and some LipCppTest functions.
  • [f401264]: Wrapper for all classes completed
  • [2ef49a]: Bug fixes in wrappers and created testing_libcpp.R to test libcpp_test.R
  • [02124a]: Final updated wrappers.

Modifying autowrap to automatically generate wrappers for libcpp_test

All the modifications aimed at automated generation of wrappers from pxd declarations have been made to my forked copy of autowrap here.

The commit history for above is available here

Important Commits:

  • [259d87]: Started with modification of existing conversion providers( specifically modifying input_conversion,output_conversion and type_check_expression*)
  • [53a60c]: Modified Converters except for StdMapConverter & StdVectorConverter involving nested data structures.
  • [c23210a]: Modified CodeGeneratory.py and patch for converters in ConversionProvider
  • [ace9b3]: Some Refactoring and automated generation of py_libcpp_test.R containing wrapper code.
  • [9e6b3c]: Tests for py_libcpp_test.R (test_code_generator_libcpp.R)
  • [6b6a6d]: Added support for overloaded methods
  • [08fd12]: README instructions for testing reticulte bindings.
  • [708c2b]: Code and Test Refactoring.
  • [fbdc87]: Added dictionary support in R using collections::dict()
  • [c6d053]: Adding support for getter and setter functionality along with overloaded comparison functions in R.
  • [75e3fa]: fix missing initializer function
  • [59af9]: Refactored ConversionProvider
  • [25ebb6]: Updated libcpp_test R file.

Generating reticulate bindings for pyopenms

All the modifications introduced to generate pyopenms wrapper code have been done to my forked copy of OpenMS here

The commit history for above is available here

Important Commits:

  • [2c58e7]: Modified create_cpp_extension.py and added support for special converters.
  • [40b3e4]: collections::dict() suppport
  • [172382]: Generate pyOpenMS R bindings (imports.R and pyopenms_1.R)
  • [7fc20f]: Started Adding R specific addon code in R-addons
  • [9ac644]: R specific addon code complete
  • [7b7389]: Refactoring of R specific addon code
  • [971e57]: Added missing MzMLFile.R addon
  • [b12c63]: Update function names and docs improvement for addons/ConvexHull2D.pyx.
  • [faea67]: Changed function name to align_4 in addons/MapAlignmentAlgorithmIdentification.pyx

Other Contributions

I opened a pull request aimed at fixing the issue of certain addons function definitions being currently inaccessible in pyOpenMS. This PR is yet to be merged and can be found here

What's Left and Future Prospectives

  • Current modifications in autowrap allow automated generation of reticulate wrappers exclusively. Without much efforts, this functionality can be integrated into main autowrap repository, to allow for the generation of both cython and R code.

  • Currently, only the source code for R package in being generated via autowrap. So, we need to manually create a DESCRIPTION file which will go along the source code files residing in R\ folder. Integrating this functionality into autowrap can also be easily achieved.

  • Setting up Continuous Integration using Travis or GitHub Action to test the building of R package.

  • The R bindings currently don't deal with 64 bit integers as base R only supports 32 bit integers. Trying to access a 64 bit integer in R simply returns -1 for now. Adding support for 64-bit integers using bit-64 or gmp package might prove helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment