Skip to content

Instantly share code, notes, and snippets.

@sarthakgupta072
Last active November 22, 2020 07:32
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save sarthakgupta072/d46494fac5fcdec1e059c148f5603d16 to your computer and use it in GitHub Desktop.
Save sarthakgupta072/d46494fac5fcdec1e059c148f5603d16 to your computer and use it in GitHub Desktop.
Final report of Google Summer of Code '20

Google Summer of Code '20 Final Report

This Gist summarises the work done by me during the 2020 Google Summer of Code, working on the ELIXIR Cloud & AAI ecosystem for the Global Alliance for Genomics and Health organization, mainly under the guidance of my mentor Alex Kanitz.

The project "Dynamically adding data to DRS" that I worked on was one of four projects of the Global Alliance for Genomics and Health selected for the 2020 Google Summer of Code.

Background

The Global Alliance for Genomics and Health (GA4GH) is a "policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework."

Ref: GA4GH Website

The sharing of any type of data is done in a feasible way only when there are set of rules and policies established which specify how the data is exchanged. When these rules and policies are specified, it becomes easier and convenient for people to share data.

The Global Alliance for Genomics and Health (GA4GH) is helping to achieve this mission in the field of biomedical research. The sharing of data for biomedical research is of key importance in ensuring continued progress in our understanding of human health and wellbeing.

It establishes general frameworks to enable the sharing of genomic and clinical data and also catalyze data sharing projects that support this mission.

The Idea 💡

The GA4GH Cloud Work Stream develops API standards and implementations that make it possible to share data easily and in a consistent way.

There are currently four API standards that allow a person to:

The ELIXIR Cloud & AAI ecosystem is a multinational initative that commits to implement the GA4GH Cloud APIs in an attempt to set up a federated cloud-based compute infrastructure for biomedical academic research and beyond. As such, they already have released relatively mature implementations for both WES (cwl-WES) and TES (TESK).

The goal of this project is to develop an open source, generic (i.e., not tied to any specific data provider), distributable and highly reusable DRS microservice implementation with diverse and unique use cases in the operationalization of the GA4GH Cloud Work Stream.

On analysing the already available microservices implemented by ELIXIR Cloud & AAI, it was found that many of them include code repetition across services. This lead to the idea of developing a microservice "archetype" providing tools and utilities for quickly developing microservices based on a defined tool stack.

Therefore, as the first part of the project, I was to design and implement this archetype, together with my mentor and two other ELIXIR Cloud & AAI interns. Upon completion, I would then use the archetype as a backbone for the development of the DRS implementation.

What did I achieve? 🎉

At the end of 3 months of eat-sleep-code-review-repeat, I contributed significantly to the following milestones:

A lot of emphasis was put on following good coding practices, and so I am proud to say that all of the above projects are well documented, have continuous integration pipelines set up and are complete with unit tests covering basically 100% of the code! 🌟

My Contributions 💻

FOCA

DRS-Filer

DRS-cli

Other Contributions

Here I replaced some code in available ELIXIR Cloud & AAI repositories with generalized code in FOCA.

What is left?

Most of the milestones that were specified in the original proposal were covered. Few of the things which have not been covered are:

  • A PUT endpoint in DRS-Filer is yet to be implemented. Given that the [DRS specification][ga4gh-drs-specs] do not specify this endpoint, and it does not add immediate value (the same outcome can be achieved through the application of the POST and DELETE endpoints), it was decided that in order to better understand how such an endpoint could be used best, we should better collect real-world usage experiences with DRS-Filer.
  • Add Gunicorn support to FOCA, so that applications based on it can easily scale in production settings.
  • Integration with TESK: as a strong emphasis was put on code quality and the exact usage of DRS in connection with WES and TES implementations is as yet still only vaguely defined by the GA4GH, these stretch goals could not be addressed in the given time frame.

Outlook

One of the major goals of the ELIXIR Cloud & AAI ecosystem is the integration of all GA4GH Cloud Work Stream API components, TRS, WES, TES and DRS, such that data analysis workflows can be executed on any data in a decentralized, federated cloud infrastructure. GA4GH Cloud The addition of DRS-Filer to the ELIXIR Cloud & AAI service stack represents another step towards this ambitious goal.

My Journey 🚴

In my freshman year of university, I got introduced to open-source and fell in love with it. It became a pleasant way for me to code, learn and collaborate with people from all around the world. Since that time, I have tried to remain connected with it.

In late 2019, I was searching for a new organisation where I can contribute and learn more about the technologies that I was interested in. The Google Summer of Code archive page was the place where I started my search and came across the Global Alliance for Genomics and Health.

On the inital conversation, I was welcomed by my present mentor Alex Kanitz, and at that moment I decided that this would be the community that I will feel happy to contribute to. After some dicussions with my mentor, I started setting up FOCA. During these contributions, I learned to write great quality code and best practices that should be kept in mind by any developer.

Later I got selected for the project Dynamically adding data to DRS. 🥳

GSoC'20 has been one the best experiences of my life that I will remember for a long time to come. Over the last few months, apart from writing quality code, I have learned to take ownership of a project. I enjoyed the time working with my mentor and the community in general, and I thank them for giving me an amazing experience. Unlike most of the students who take part in GSoC, my journey with my organisation is not over yet and I feel there is still a long way to go...

banner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment