Implementing the DOS Server. A standard API created by GA4GH.
Developed as part of Google Summer of Code 2018.
Global Alliance for Genomics and Health (GA4GH) is an international, nonprofit alliance formed to accelerate the potential of research and medicine to advance human health. They have developed the Data Object Service (DOS), which is an emerging standard for specifying location of data across different cloud environments. The goal of DOS is to create a generic API on top of existing object storage systems so workflow systems can access data in a single, standard way regardless of where it's stored. The standard API is split into two sections: data object management and data object querying. The former is done by a DOS Server while the latter is done by a DOS Registry (service registry).
View the DOS Registry schemas in Swagger UI
View the DOS Server schemas in Swagger UI
As part of Google Summer of Code 2018 I developed from scratch 3 projects: an implementation of a DOS Server, a wrapper that loads data from PGP Canada into a DOS Server database, and a wrapper that loads data from a public GCP Bucket into a DOS Server database. These projects can be found at the following links
In case these repositories are updated in the future, the commit intended for GSoC 2018 final evaluation are labeled "Final GSoC Commit". Documenation on how to use each project can be found in the README.md of the repective github repositories.
The DOS Server uses the Springboot JPA framework connected to a MYSQL database with KeyCloak authenitcation. My implementation has the following functionality (unless otherwise specified, anything implemented for a Data Object is also implemented for a Data Bundle):
- GET all Data Objects
- GET Data Object by id
- GET all Data Objects by alias
- Versioning of Data Objects
- GET all versions of a Data Object
- GET previous version of a Data Object by id
- POST, PUT, DELETE Data Objects
- Custom Pagination
- Data Object endpoints require admin authentication
- Data Bundle endpoints require user or admin authentication
The PGP Wrapper and GCP Wrapper are both functional and both successfully load data from their respective cloud environments into a DOS Server.
TODO | Current State |
---|---|
KeyCloak authorization using access tokens | Not supported |
Create a docker image that automatically configures keycloak and mysql and deploys the DOS Server | There is a develop branch where this is attempted but contains bugs |
Support other versioning schemas | Version number of a Data Object and Data Bundle must take the form x.x.x |
system_metadata and user_metadata fields support key-value pairs with the key as any abitrary string and the value as any arbirary object |
Key must be a string and the value is serialized to a string regardless of its type |
GET Data Object by checksum | Not supported |
GET Data Bundle by checksum | Not supported |
Working on this project was an amazing experience. It was a great introduction to the current tools being used in the tech industry and taught me a lot about the inner-workings of a startup company. I would like to thank the members of GA4GH and GSoC for providing me this opportunity. I would also like to thank Miro Cupak, Marc Fiume, and the rest of the DNAstack team for being so welcoming to me and helpful towards the completion of this project.