OpenMS, an open-source C++ library for the analysis of mass spectrometric data exposes a large part of its functionality to the Python community via the automatically generated pyOpenMS package.In order to make the OpenMS algorithms available in R, we can use the reticulate package which is an R interface to python, to call the pyopenms methods. This project involved expanding this initial prototype to ccreate an R package to expose the classes and methods in pyopenms.
It is one of the selected projects under Open Bioinformatics Foundation for GSoC 2020. It was executed under the mentorship of Timo Sachsenberg, Hannes Röst and Oliver Alka.
The GitHub project management dashboard for this project can be found here.
I also created blog posts to describe the implementation details and post updates, which are available here
I started by creating a minimal package including some of the pyOpenMS classes to showcase that these classes could be easily wrapped in R as an R6 class. The reason for choosing R6 classses as wrappers was firstly that they follow reference semantics which will allow us to change the state of the encapsulated python object and secondly that R6 classes are faster and memory efficient alternative to R's built-in reference classes which are also based on the same principle. In this package, automatic type conversion (R data type to python and vice versa) was also tested for simple but important functions like get_peaks()
and set_peaks()
which expect numpy arrays as input and it was concluded that in our case, reticulate handles conversion of R data types well and explicit conversion is necessary only if the argument passed to a function is getting modified or if the function expects a set/dictionary/nested data structure having any class to wrap.
The commit history related to above can be found here. Blog related to above can be found here
Since, pyOpenMS is a vast library with many classes, we decided to automate the generation of these wrappers as otherwise it would be too much of a tedious task to write these wrappers manually. We decided to modify autowrap, a python library to automate cython code generation using declarations from pxd files, in order to generate R6 wrapper classes. First the plan was to test if these wrappers could be created in a way compatible to workflow of autowrap. For this purpose, I wrote a script containing wrapper code for libcpp_test.py which is one amongst the test files used by autowrap.
Commits The commit history for above is included here
Important Commits:
All the modifications aimed at automated generation of wrappers from pxd declarations have been made to my forked copy of autowrap here.
The commit history for above is available here
Important Commits:
- [259d87]: Started with modification of existing conversion providers( specifically modifying input_conversion,output_conversion and type_check_expression*)
- [53a60c]: Modified Converters except for StdMapConverter & StdVectorConverter involving nested data structures.
- [c23210a]: Modified CodeGeneratory.py and patch for converters in ConversionProvider
- [ace9b3]: Some Refactoring and automated generation of py_libcpp_test.R containing wrapper code.
- [9e6b3c]: Tests for py_libcpp_test.R (test_code_generator_libcpp.R)
- [6b6a6d]: Added support for overloaded methods
- [08fd12]: README instructions for testing reticulte bindings.
- [708c2b]: Code and Test Refactoring.
- [fbdc87]: Added dictionary support in R using collections::dict()
- [c6d053]: Adding support for getter and setter functionality along with overloaded comparison functions in R.
- [75e3fa]: fix missing initializer function
- [59af9]: Refactored ConversionProvider
- [25ebb6]: Updated libcpp_test R file.
All the modifications introduced to generate pyopenms wrapper code have been done to my forked copy of OpenMS here
The commit history for above is available here
Important Commits:
- [2c58e7]: Modified create_cpp_extension.py and added support for special converters.
- [40b3e4]: collections::dict() suppport
- [172382]: Generate pyOpenMS R bindings (imports.R and pyopenms_1.R)
- [7fc20f]: Started Adding R specific addon code in R-addons
- [9ac644]: R specific addon code complete
- [7b7389]: Refactoring of R specific addon code
- [971e57]: Added missing MzMLFile.R addon
- [b12c63]: Update function names and docs improvement for addons/ConvexHull2D.pyx.
- [faea67]: Changed function name to align_4 in addons/MapAlignmentAlgorithmIdentification.pyx
I opened a pull request aimed at fixing the issue of certain addons function definitions being currently inaccessible in pyOpenMS. This PR is yet to be merged and can be found here
-
Current modifications in autowrap allow automated generation of reticulate wrappers exclusively. Without much efforts, this functionality can be integrated into main autowrap repository, to allow for the generation of both cython and R code.
-
Currently, only the source code for R package in being generated via autowrap. So, we need to manually create a DESCRIPTION file which will go along the source code files residing in
R\
folder. Integrating this functionality into autowrap can also be easily achieved. -
Setting up Continuous Integration using Travis or GitHub Action to test the building of R package.
-
The R bindings currently don't deal with 64 bit integers as base R only supports 32 bit integers. Trying to access a 64 bit integer in R simply returns -1 for now. Adding support for 64-bit integers using
bit-64
orgmp
package might prove helpful.