Skip to content

Instantly share code, notes, and snippets.

@carlthewebmaster
Created May 2, 2019 18:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save carlthewebmaster/ec388260abf82240c608d912c8f456e3 to your computer and use it in GitHub Desktop.
Save carlthewebmaster/ec388260abf82240c608d912c8f456e3 to your computer and use it in GitHub Desktop.

On April 23 and 24 a group of NCBI computer scientists, subject matter experts and customer service representatives conducted an in-house tool building exercise focused on building a prototype “what’s in my tube” workflow. This event was an attempt at converting the lessons learned from the January virus hackathon into a prototype tool that strings together the disparate computational steps used in the hackathon into a single, modular pipeline that sequentially analyzes the content of user provided next generation sequencing data.

The pipeline was built into a stand-alone Jupyter notebook interface that allows users to see and edit analysis parameters and view graphic displays of results.

While the April event resulted in the basic coding of the pipeline and an initial (MVP) approach to how to run the pipeline in GCP, work remains, particularly in fleshing out effective ways to host and run Jupyter Notebooks on GCP (including access to BLAST, scaling architecture, and visualizations). By the end of the two days, in other words, we had working code, but were pretty far from an easy-to-use, modular package that an outside user could pick up and run easily.

Another track in the hackathon was to develop baseline learning materials for using github, including code updates and documentation, how to share and collaborate with Jupyter notebooks, and how to access cloud resources generally. We discovered a lot of potential challenges for new users and began an initial set of documentation; however, as with the pipeline/notebook work only limited success in completing tutorial materials suitable for an end user.

Once complete, we believe that the pipelines, notebooks, and related tutorial material will provide a solid basis for getting future hackathon participants up and running quickly – alleviating potential pain points observed in the January virus hackathon – as well as providing a solid curriculum for workshop events and on-demand educational tools that demonstrate the potential of cloud platforms in sequence analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment