Skip to content

Instantly share code, notes, and snippets.

@aaravm
Last active September 16, 2024 06:11
Show Gist options
  • Save aaravm/ded3ba164ce07e87834325da3bace80d to your computer and use it in GitHub Desktop.
Save aaravm/ded3ba164ce07e87834325da3bace80d to your computer and use it in GitHub Desktop.
Google Summer of Code 2024 Final Report

In the summer of 2024, guided by the expertise of Pavel Nikonorov and Alex Kanitz, I undertook a significant technological challenge. It is with great pride that I present the culmination of my efforts: Extensible GA4GH Client Library/SDK and Command Line Interface implemented in Rust. Throughout this endeavor, I delved deep into the complexities of TES API, and crafted a library to access them safely in Rust, and a Command Line Interface, so that Researchers can easily access these API's. This project stands as a testament to collaborative effort and the shared vision of advancing scientific discovery through technology.

Click here for proposal submitted.

1. Background

The Global Alliance for Genomics and Health better known as GA4GH is a "policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework."

ELIXIR coordinates and develops life science resources across Europe so that researchers can more easily find, analyze and share data, exchange expertise, and implement best practices.

ELIXIR Cloud & AAI is a cross-platform initiative of ELIXIR and a Driver Project of the GA4GH that develops services towards establishing a federated cloud computing network that enables the analysis of population-scale genomic and phenotypic data across participating, international nodes.

ELIXIR Cloud components are the Web Components which are developed & managed by the ELIXIR Cloud & AII Community.

2. Motivation

The motivation behind this project was to bridge the gap between the vast array of tools available in the genomics and health domain and the end-users who could benefit from them.

  • Development of the Generic GA4GH SDK Library: Recognizing the need for a unified library, which can easily access the various API's offered by GA4GH in a safe and secure manner, I aimed to design the library that would act as a way for any developer using the GA4GH API's to access them in a safe manner. I wrote it in Rust, as it is highly optimized for speed and low memory usage, which is beneficial when interacting with APIs. A secondary benefit of using a library is that GENXT can use it to add confidential computing to it, so that anyone accessing it can interact with in a secure isolated environment

  • Creation of a Command Line Interface to access the library: The CLI interface is mainly for Researchers to easily use and access the API's without worrying about the details of the API and how to send/access the data in a secure manner.

3. Implementation

The first plan of action was to figure out the archtiecture of the Library, what structs will be there, and how they will interact. This is the final decided architecture. There are 2 versions over here. One is a more low level view regarding how different structs can interact, while the other version is a more high level view of how these libraries can interact.

For the CLI, it was decided that we could use Rust, since it is a very robust language, and it is easier to integrate a Rust CLI with rust library.

4. What I Did (Pull Requests):

  1. feat: an initial client implementation of GA4GH ServiceInfo and TES APIs as a library
    • Description: This issue was the first PR I made in my GSOC period. After creating an initial version of the GA4GH-SDK library, I had created this PR. Later, Alex suggested to split this into multiple PRs. So,after changing a lot of changes suggested, I had closed this PR, and split them into multiple PRs.
  2. chore: add .gitignore : The .gitignore file was added in this PR.
  3. docs: add README : The README file was added in this PR.
  4. chore: add Rust build system : The Cargo.toml file was added in this PR.
  5. feat: add script to generate OAI models
    • Description: In this PR, I have introduced a script to automate the generation of Rust models from OpenAPI specifications, ensuring the necessary structs are built for the main functions.
  6. feat(serviceinfo,tes): add models
    • Description: In this PR, I have added all the autogenerated models created by running the script in the #27 This PR doesn't contain any manually written code, and everything is autogenerated.
  7. feat: add configuration & transport structs
    • Description: This pull request introduces two new structs: Configuration for storing API request details and Transport for making HTTP requests using the reqwest library. It also includes unit tests for the Transport struct.
  8. feat(serviceinfo): add struct
    • Description: This pull request introduces the ServiceInfo class to handle details of any implementation of the main GA4GH API and return its details.
  9. feat(tes): add struct
    • Description: This pull request introduces the TES struct with task management methods (create, list, get, delete, get status Tasks) using the TES API and corresponding unit tests, and integration tests with Funnel.
  10. ci: workflows for local runs and GH actions
    • Description: This pull request introduced initial continuous integration workflows for local and GitHub CI/CD, but was later closed, to use the cookie cutter templates developed by Javed at ci: add workflows
  11. feat(cli): add TES support
    • Description: This pull adds a new CLI tool for TES task management with commands for creating, listing, retrieving, checking the status of, and canceling tasks, along with usage documentation.
  12. feat(cli): add config
    • Description: This pull adds the functionality to read configuration from a JSON file and update the documentation with usage examples for the CLI tool
  13. python bindings
    • Description: This branch adds a python bindings for the entire Rust library created until now, so that users can access this library in Python as well.

4. Outlook

I successfully achieved all the primary objectives outlined in the proposal. While the project's scope and timeframe did not allow for the integration of API's other than TES, the basic outline for making other libraries is made, which can be the same architecture as TES (with the exception of AAI). All the other goals were met, with unit and integration tests and documentation. Also, the bindings stretch goal is also almost complete. Overall, I am very happy with the work done, and am excited to see how this project evolves, and how GENXT integrates confidential computing in this, and gets used by researchers.

Building on the project's foundation, I think the following issues are the most major ones to be added:

  • Creating a more generalized integration testing, so that other TES implementations apart from Funnel are used.
  • Adding the remaining GA4GH API's to the project

5. Acknowledgment

I would like to express my special thanks of gratitude to my mentors Pavel Nikonorov and Alex Kanitz, who gave me the golden opportunity to work on the project "Extensible GA4GH Client Library/SDK and Command Line Interface implemented in Rust". The invaluable guidance, feedback, and support of my mentors throughout this journey have been instrumental, and has taught me a lot about how to work in an organization, and about GA4GH and GENXT. Their expertise and insights were pivotal in shaping the direction and outcomes of this project. I also want to extend my appreciation to the GSOC program for providing me with this remarkable opportunity to contribute to a project with such profound implications for the future of genomics and health.

banner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment