Skip to content

Instantly share code, notes, and snippets.

@LakiG
Last active September 12, 2022 12:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save LakiG/0d7feb7af004d6478d8bb802ad5034b6 to your computer and use it in GitHub Desktop.
Save LakiG/0d7feb7af004d6478d8bb802ad5034b6 to your computer and use it in GitHub Desktop.
Google Summer of Code '22 Final Report - Compliance testing framework for the Task Execution Service API

Google Summer of Code '22 Final Report

This report summarizes the work done by me in the Google Summer of Code 2022 program as a contributor for the Global Alliance for Genomics and Health organization, under the guidance of the mentors Alexander Kanitz, Ania Niewielska, Kyle Ellrot, Alvaro Gonzalez.

Background

Formed in 2013, the Global Alliance for Genomics and Health (GA4GH), an international non-profit alliance is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework.

ELIXIR is a multinational Europe-based initiative that unites life science laboratories and organizations to establish a common infrastructure that supports and integrates scalable, sustainable bioinformatics and data analysis services for member states and beyond.

ELIXIR Cloud & AAI is a subgroup of ELIXIR and the driver project of GA4GH. ELIXIR Cloud & AAI develops services towards establishing a federated cloud computing network that enables the analysis of population-scale genomic and phenotypic data across participating, international nodes.

Motivation

GA4GH and ELIXIR AAI have developed multiple API services which are used by people around the world. These services are maintained via API standards based on the FAIR principles (Findable, Accessible, Interoperable, and Reproducible) to enable portable execution of computational workflows and sharing of data and tools.

A thorough compliance test is needed to ensure that services are built as intended. The compliance testing suite tests the feature functionality along with the data models enabling the service to be compliant with GA4GH standards.

Implementation

The existing compliance testing suites are tightly coupled with the API specifications, resulting in continuous maintenance and changes. Developing an automated independent compliance suite eliminates this problem and can be updated with minimal changes.

The compliance suite is designed to be modularized where the users can customize the YAML test files according to their requirements. The YAML test files are designed to flow sequentially without the need to be dependent on other test files. This approach enables the tests to be loosely coupled and not hard coded in the codebase. Any number of job combinations and tests can be executed this way without needing to modify the codebase.

The compliance suite has 3 components -

  1. YAML Job Parser - To parse the YAML files and extract the job info
  2. Test Runner - To execute the tests and validate the results
  3. Report Generator - To store the results for further purposes

Architecture

What did I achieve?

During the development of this project, the following milestones were achieved:

  • Developed a new compliance suite for TES from scratch
  • Built an automated independent test runner based on a YAML file parser and Pydantic models
  • Explored the S3 storage connectivity via MinIO and boto3 - AWS SDK (The feature is in the works)
  • Generated a JSON report containing the test results
  • Developed a web report view via HTML, CSS and JS allowing the users to view the report via text or table and download it

The documentation for the project regarding the installation, usage and architecture can be read here.

Outlook

Some features which can be integrated in future are:

  • Separating the YAML tests and the test runner - This will allow multiple API specifications to be added onto the test runner. The YAML tests will be a consolidated list of tests for multiple API specifications.

  • Cloud server testing - A S3 storage feature testing the server implementation is in the works. This can be extended to handle complex scenarios and include more storage systems like FTP, GS, etc.

Acknowledgement

I would like to thank my mentors Alexander Kanitz, Ania Niewielska, Kyle Ellrot, Alvaro Gonzalez for giving me the opportunity to participate in Google Summer of Code 2022 and contribute to Global Alliance for Genomics and Health with my project Compliance testing framework for the Task Execution Service API.

With their constant support, motivation and guidance throughout, I have learnt to follow the best coding practices, research auxiliary components and develop a quality project.

This has been a wonderful experience for which I have waited long. I am thankful to everyone who has helped me in this journey and hope to continue living up to my motto Make this world a better place for others to live in.

Lakshya Garg

Project Repository - Link
Project Presentation - Link

Logo_Banner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment