Skip to content

Instantly share code, notes, and snippets.

@mhpopescu
Last active October 12, 2021 10:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mhpopescu/da10ef62158f03141e7898d2bb156c05 to your computer and use it in GitHub Desktop.
Save mhpopescu/da10ef62158f03141e7898d2bb156c05 to your computer and use it in GitHub Desktop.

Google Summer of Code 2021 Work Product Submission

Project:
Data streaming in scientific workflows, implementation for Toil

At the start of GSoC I posted a blog post in CWL community with my proposed design of the project and a short introduction about Toil. It was cross-posted on Open Bioinformatics Foundation (OBF) blog:
Working on a CWL-Toil project with the Open Bioinformatics Foundation


I accomplished most of my goals. The main software artifact is the implementation of input data streaming in toil-cwl-runner. I managed to achieve a bonus goal of the project which was allowing data streaming for both AWS and Google Cloud buckets by making use of existing cloud connectors in Toil. The feature was merged.
Pull Request: DataBiosphere/toil#3694
Issue: DataBiosphere/toil#3469
I also created tests for this feature:
CWL workflow for testing input streaming
Test the workflow in a Toil way

The feature required the support of named pipes in cwltool, a CWL project used by toil-cwl-runner. I implemented the support for it and it was merged.
Pull Request: common-workflow-language/cwltool#1469
Issue: common-workflow-language/cwltool#1468
Test for this feature:
Test streaming in cwltool


Other contributions

I discovered a bug in toil-cwl-runner which caused the files to be downloaded twice. I fixed it and it was merged:
Pull Request: DataBiosphere/toil#3670
Issue: DataBiosphere/toil#3665

I discovered a bug in a Toil tutorial and proposed a solution:
Tutorial issue

To get familiarised with CWL and Toil, I tested different streaming examples in Toil with different setups and tools:
https://github.com/mhpopescu/toil-gsoc-tests


Unfinished extra work

Additional work was implementing streaming outputs which is not finished yet and not merged. https://github.com/mhpopescu/cwltool/tree/stream-outputs
https://github.com/mhpopescu/toil/tree/stream-outputs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment