Skip to content

Instantly share code, notes, and snippets.

@JaeAeich
Created August 24, 2024 11:18
Show Gist options
  • Save JaeAeich/54b3bc916f1d49ed7e53eedf8f984f45 to your computer and use it in GitHub Desktop.
Save JaeAeich/54b3bc916f1d49ed7e53eedf8f984f45 to your computer and use it in GitHub Desktop.
GA4GH | GSoC'24 | TESK - A GA4GH-TES based Kubernetes batch execution service

Final report GSoC'24 project ELIXIR Cloud Components

This is the final report for my project that I've been working on during my summer of 2024 in guidance of above mentioned mentors.

I am pleased to share that I have completed the project TESK - A GA4GH-TES based Kubernetes batch execution service in the given time frame and effectively accomplished my tasks and objectives related to it.

Click here for proposal submitted.

Background

The Global Alliance for Genomics and Health better know as GA4GH is a "policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing within a human rights framework."

ELIXIR coordinates and develops life science resources across Europe so that researchers can more easily find, analyse and share data, exchange expertise, and implement best practices.

ELIXIR Cloud & AAI is a cross platform initiative of ELIXIR and a Driver Project of the GA4GH that develops services towards establishing a federated cloud computing network that enables the analysis of population-scale genomic and phenotypic data across participating, international nodes.

The Task Execution Service (TES) API is an effort to define a standardized schema and API for describing batch execution tasks. A task represents a set of input files, a set of (Docker) containers and commands to run, output files, and some other logging and metadata. It provides a standard mechanism for managing tasks' deployment, scheduling, running, and clean-up across different computing environments, including high-performance computing (HPC) systems and cloud environments. This standardization allows researchers to move their computational workloads between different environments without needing to rewrite their code for each specific infrastructure.

An implementation of a task execution engine based on the TES standard running on kubernetes. TESK can be deployed using helm charts given in the repo.

Motivation

My objective was a full-scale redesign of TESK, to revamp the Kubernetes batch execution service to adhere to the Task Execution Service (TES) standards established by the Global Alliance for Genomics and Health (GA4GH). Rewrite the existing Java codebase in Python, utilizing the latest version of Python dependencies. Address deprecated code and enhance maintainability, considering the existing task-core is written in Python. Merge all TESK repositories into a unified repository to streamline management and reduce redundancy. Simplify dependency management, ensuring compatibility and synchronized updates across components. Facilitate collaborative development and code sharing among developers. Implement TES v1.1.0 To incorporate the latest GA4GH TES features, and comprehensive support for client-side GUI components. Ensure compatibility and interoperability with existing systems. Upgrade to TES v1.1.0 and update helm charts and docker images. Merge deployment charts and add tests for continuous integration (CI) pipelines.

Achievements

In addition completing my project, I contributed to various other aspects of the project, such as setting up linters, configuring pre-commit hooks, addressing monorepo-related challenges and templating boilerplate code . Given below are the issues and PRs that I have worked on so far:

  • Merge Repositories: Successfully merged tesk-api and tesk-core into the TESK repository while preserving commit history.

  • Standardize Project Structure: Updated the project with standardized boilerplate templates and added essential documentation, including a Code of Conduct and linter configuration files.

  • CI Pipeline Updates: Developed CI pipelines using the latest standards and a diverse tech stack for Python.

  • Migration to pyproject.toml: Migrated the project to pyproject.toml for improved dependency management and to ensure it remains up-to-date.

  • API Server Setup (FOCA): Leveraged FOCA and the latest TES specifications to bootstrap the API server.

  • Implement GET /service-info: Developed the GET /service-info endpoint for the API server.

  • Auto-Generated Documentation: Implemented auto-generated documentation and set up RTD to automatically host the latest documentation with each code change.

  • Kubernetes Client and Abstraction: Developed a Kubernetes client and abstraction layer for the project.

  • Implement GET /tasks: Developed the GET /tasks endpoint, enabling clients to request and generate Kubernetes manifests for Job and ConfigMap to execute tasks.

  • Implement GET /tasks/{id}: Developed the GET /tasks/{id} endpoint, allowing clients to retrieve specific tasks running in the Kubernetes cluster.

  • Implement POST /tasks/{id}:cancel: Developed the POST /tasks/{id}:cancel endpoint, enabling clients to cancel specific tasks running in the Kubernetes cluster.

  • Helm Chart Updates: Updated the Helm chart to the latest version and added necessary configurations for the project.

Outlook

While I aimed to achieve even more and cover all my stretch goals, I'm satisfied with the contributions I made.

Future Focus:

  • Incorporate compliance testing into the CI/CD pipeline.
  • Implement data encryption and decryption to ensure anonymity.

Acknowledgment

I would like to express my special thanks of gratitude to my mentors who gave me the golden opportunity to work on such an amazing project.

Over the last few months, apart from writing quality code, I have learned to take ownership of a project. I would also like to thank the GSoC program & GA4GH organization for providing me this wonderful experience over the last 2-3 months.

Javed Habib

I am immensely grateful for the experience and growth this opporunity has brought me, the experience was like none other.

banner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment