carlthewebmaster/summary.md

## summary.md

      
    Raw
  

              summary.md
            
          
    On April 23 and 24 a group of NCBI computer scientists, subject matter experts and customer service representatives conducted an in-house tool building exercise focused on building a prototype “what’s in my tube” workflow. This event was an attempt at converting the lessons learned from the January virus hackathon into a prototype tool that strings together the disparate computational steps used in the hackathon into a single, modular pipeline that sequentially analyzes the content of user provided next generation sequencing data.
The pipeline was built into a stand-alone Jupyter notebook interface that allows users to see and edit analysis parameters and view graphic displays of results.
While the April event resulted in the basic coding of the pipeline and an initial (MVP) approach to how to run the pipeline in GCP, work remains, particularly in fleshing out effective ways to host and run Jupyter Notebooks on GCP (including access to BLAST, scaling architecture, and visualizations).  By the end of the two days, in other words, we had working code, but were pretty far from an easy-to-use, modular package that an outside user could pick up and run easily.
Another track in the hackathon was to develop baseline learning materials for using github, including code updates and documentation, how to share and collaborate with Jupyter notebooks, and how to access cloud resources generally.  We discovered a lot of potential challenges for new users and began an initial set of documentation; however, as with the pipeline/notebook work only limited success in completing tutorial materials suitable for an end user.
Once complete, we believe that the pipelines, notebooks, and related tutorial material will provide a solid basis for getting future hackathon participants up and running quickly – alleviating potential pain points observed in the January virus hackathon – as well as providing a solid curriculum for workshop events and on-demand educational tools that demonstrate the potential of cloud platforms in sequence analysis.