Skip to content

Instantly share code, notes, and snippets.

@Dilschat
Last active August 13, 2018 16:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Dilschat/898155f861604bca5a63258e342f0a3c to your computer and use it in GitHub Desktop.
Save Dilschat/898155f861604bca5a63258e342f0a3c to your computer and use it in GitHub Desktop.
GSOC 2018 GA4GH project submission description

GSOC 2018 GA4GH project submission description

This document is the final report for the GSOC project 2018 containing all the work I've done for the GA4GH project. A detailed description of the project is available on the project page.

Biosamples

A goal of the project was to create an API for discovering over BioSamples using the GA4GH metadata schema and stream sequencing data back from ENA via the htsget protocol. Furthermore, there is additional objective: providing Phenopackets export.

Phenopackets serialisation

The Phenopackets serialisation is completely done and merged into Biosamples. You can see all the code I've produced in the pull request. More details on the specific additions are available in the links provided below.

What I did:

How to use it:

  1. Fetch code from phenopacket_integration branch
  2. Run it using docker-webapp.sh and docker-agents.sh
  3. If you have no samples in your local Biosamples you should submit it according instructions.
  4. Run example: http://localhost:8081/biosamples/samples/SAMEA100000.phenopacket

GA4GH searching

I was able to complete the task of building API to query BioSamples using GA4GH metadata. This API though relies on the ENA htsget service, which is not deployed in production yet. For this reason at the moment is not possible to merge my code into the BioSamples repository. You can see all the code I've produced in the pull request. More details on the specific additions are available in the links provided below.

What I did:

What remains to do:

  • Merge the pull-request into Biosamples
  • Deploy htsget service and change the dummy link to this service to link to real host instead of testing localhost. (Marked by TODO comments)

Link to the original repository

Link to my fork

Htsget service (EGA-dataedge)

This piece of the project is the implementation of htsget protocol for ENA. The protocol specifications are available here. I've completed the ENA htsget service, but as previously said this is still not merged into the EGA-data project nor deployed into production. You can see all the code I've produced in the pull request. More details on the specific additions are available in the links provided below.

What I did:

What remains to do:

  • Merge my pull-request into EGA-dataedge repository
  • Deploy the service in production
    • Update all hosts in ENA htsget services to real service hosts (marked by TODO comments)

Link to the original repository

Link to my fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment