Skip to content

Instantly share code, notes, and snippets.

@DrYak
Created March 31, 2024 06:30
Show Gist options
  • Save DrYak/ed38215519683ccd5c4657760050add7 to your computer and use it in GitHub Desktop.
Save DrYak/ed38215519683ccd5c4657760050add7 to your computer and use it in GitHub Desktop.
Google Season of Docs - Better V-pipe tutorial for virus variant surveillance in wastewater

Better V-pipe tutorial for virus variant surveillance in wastewater

About V-pipe

The Computational Biology Group of ETH Zürich (CBG-ETHZ), member of the Swiss Institute of Bioinformatics (SIB), has developed the SIB Software Resource V-pipe: an Apache-licensed computational pipeline for the analysis of virus next-generation sequencing (NGS) data, specializing in samples of mixed viral populations.

This bioinformatics workflow has important applications in several different settings:

  • In clinical virology, information about the detection and quantification of viral quasispecies in a single patient sample can assist clinicians in the optimization of the treatment for this patient.
  • In clinical epidemiology, sequencing test samples during a viral outbreak provides information about the viral variants found in the population, which helps better understanding of the epidemic dynamics and guiding the decision-making process of public health authorities in a timely fashion.
  • In environmental epidemiology, the sequencing of environmental samples enables the assessment of the circulation of viral variants independently of any large-scale public testing effort, but it requires specialized methods for deconvolution as a single sample provides information covering a large population of hosts living in that environment.

Role during the pandemic

In addition to its previous uses in clinical research settings such as HIV, our pipeline has thus seen increased use during the recent COVID-19 pandemic that started spreading into Europe in 2020, and V-pipe's automatic workflow played an important role in the genomic surveillance of SARS-CoV-2 in Switzerland (e.g.: the 75’000 sequences on ENA and GISAID published by the Swiss SARS-CoV-2 sequencing consortium (S3C) have been analyzed using our workflow).

Application to wastewater

In particular, the analysis of environmental wastewater samples is an application for which the pipeline has been successfully adapted, based on its focus on analyzing mixtures of viral variants in a single sample: Such environmental surveillance of the SARS-CoV-2 pandemic has become an increasingly important source of information on the spread of variants since clinical tests declined and are currently close to disappearing. Our bioinformatics workflow V-pipe, including its components specialized in the analysis of wastewater, are at the core of the ongoing wastewater-based SARS-CoV-2 variant monitoring commissioned by the Swiss Federal Office of Public Health, a cornerstone of the current pandemic surveillance in Switzerland. These surveillance efforts enable early warning of the introduction of new variants, provides estimates of their spread, and evaluates epidemiological characteristics, earlier than traditional clinical surveillance and at a fraction of the cost (10.1038/s41564-022-01185-x, 10.1101/2022.11.02.22281825, 10.4414/SMW.2022.w30202).

V-pipe uses outside of Switzerland

Abroad, in the late 2021, a software component developed as part of the V-pipe bioinformatic workflow - the GPL-licensed COJAC - has been used by the UK Health Security Agency to monitor the spread of Omicron variant across 450 wastewater sampling sites in the UK (Omicron, VOC- 21NOV-01 (B.1.1.529) Technical briefing 30), a critical step in understanding of the dynamics of the SARS-CoV-2 pandemic. More recently, in autumn 2023, V-pipe is being applied in a surveillance program in Northern Italy at Arpa Piemonte to help reduce the tedious work in searching for the variant BA.2.86 "Pirola" of SARS-CoV-2 in wastewater.

The Problem

Given how Wastewater-based epidemiology (WBE) is becoming critical in the face of declining clinical testing and sequencing, we would like to facilitate for other groups to replicate our viral variant surveillance bioinformatics data analysis. We are considering several strategies to enhance discoverability and ease of onboarding, including improved documentation. We are currently modernizing the website (old version, draft of new design) and would like to take the opportunity to upgrade the documentations through the Google Season of Docs project.

The current state of the documentation:

The SARS-CoV-2 tutorial need to be rewritten, incorporating the draft HOWTO on wastewater analysis. Also, the spread of information on multiple places causes potential users' confusion and hampers discoverability.

As a consequence of the current state of documentation, on-boarding new users has in the past required booking a video-conferencing call to brief interested new users, and walking them through the steps, which is not a long-term scalable solution.

The Scope

Main scope

The main scope of this project will consist of the following deliverable:

  • Updated tutorial for SARS-CoV-2 covering the analysis of wastewater samples sequencing data. This is the minimum viable deliverable for the project.
  • Expanded tutorial about the installation of V-pipe, with an added a "Reusing an existing conda installation" section.
  • Organise these tutorial in the docs/ folder, and write additional introduction, to make them available on the Read The Docs platform
  • Review the main website and the README files to insure that information is properly linked, easily discoverable and reachable
    • Reduce duplication, while still fulfilling the mandatory requirements (e.g.:presence of file config/README.md is required by the Snakemake Workflow Catalog).

Additional goals

Additional goals fitting within the scope if time permits:

  • content of config/config.html into the Read-The-Docs
  • passing end-to-end CI test using the updated SARS-CoV-2 tutorial

Out-of-scope for this project

  • Recording an updated video presentation introducing this application of V-pipe will be dealt at a later point in time, outside this Google Season of Docs project.

Measuring the results

Internal measures

  • Test the tutorial by sending to users with no prior experience analyze this type of data:
    • ask members of the SIB to follow the tutorial and answer a short survey
    • new users at external partner centers replicating our methods (the current person working in Davos is defending his PhD soon and will need then to train a replacement)
  • Test discoverability of information by giving exercise (again asking members of the SIB) in the form of: "Use V-pipe to analyse this dataset..."

External measures

  • the next 5 new centers interested in replicating our method should be able to do so without booking a video-conference call with the developer of V-pipe
  • students at the next course using V-pipe should be able to complete tasks without opening issues on V-pipe

Timeline

The project itself will take approximately four months to complete. The period after that will be used to track the performance of the updated documentation (see External measures above). Once the tech writer is hired, we'll spend two week on tech writer orientation (and familiarization with the technologies specific to our needs and pipeline: conda, jupytext, etc.), then move onto the inventory of existing documentation and tutorials, and definition of the content required to set up a proper Read The Docs documentation (two more weeks), and finally work the last three months on creating the defined content and populating it to the documentation (through an iterative process in collaboration with the pipeline developers).

Dates Action Items
May -- Orientation and familiarization
-- Inventory of existing tutorials and documentation
-- Review proposed content to produce
-- Inventory of existing training material and proposed content to produce
June - July -- Create all identified content
-- Setup a Read the Doc
-- Populate branch of repository
August -- Review of contributions made to the documentation
-- Final changes requested by project members
-- Merge the updated docs/ and README into master
-- Linking documentation into website
-- Start to advertise the updated documentation
November -- Review the Measures
-- Project report

Project Budget

Budget item Amount Running Total Notes/justifications
Technical writer $ 5,000 $ 5,000
Project swag $ 200 $ 5,200
TOTAL $ 5,200
@rpsmaini
Copy link

rpsmaini commented May 2, 2024

Hi there my name is Ravpreet Singh Maini. I am eager to work with such an opportunity. By contributing to this project I will expand my expertise and will acquire a foundational skill in the industry.
Moreover, I have the advantage of contributing to this project full-time.
For reference here is my LinkedIn profile:
https://www.linkedin.com/in/ravpreet-singh-maini-0346b21b6
For further communication here is the email and contact number:
8871712525
rpsmaini@yahoo.com
@DrYak

@g-anush
Copy link

g-anush commented May 4, 2024

Hi @DrYak,

I am very interested in Project - "Better V-pipe tutorial for virus variant surveillance in wastewaters" and would like to take the next steps. I have thoroughly reviewed the project documents and am confident that I have the relevant skills and experience to accomplish the project. I would love to discuss the possibilities further.

For further communication here is the email and contact number:
+91-8319566004
anushgupta2001@gmail.com

Thanks

@Rafiea-Ashraf
Copy link

Dear HR,

I'm excited about the opportunity to contribute to your project. With two years of experience in writing reports for the IEEE Society and a year working on the IEEE student branch newsletter, I believe I could make a valuable addition to your team. Here is my LinkedIn to get to know me more: https://www.linkedin.com/in/rafiea-ashraf-16445b221/
I'm also attaching an email and am looking forward to the possibility of working together: rafiea.ashraf@gmail.com
Hope to hear from you soon.
Best regards,
Rafiea Ashraf

@kapelnick
Copy link

Hello,
I am interested in volunteering a few hours to this project, to gain experience with documenting open-source projects and GSoD in general!

Many thanks,
Nikos | kapelnick.mud@gmail.com

@TheRaj71
Copy link

Hello, DrYak at GSOD

My name is Raj, and I would want to contribute to the V-pipe bioinformatics workflow documentation enhancement project. My email address is theraj714@gmail.com. I think that as a seasoned technical writer, I can improve this crucial tool's discoverability and onboarding process.

For SARS-CoV-2 genomic surveillance, the V-pipe pipeline has been essential, particularly for wastewater analysis. The V-pipe pipeline has been critical for SARS-CoV-2 genomic surveillance, especially in wastewater analysis. Improving the documentation is crucial as wastewater-based epidemiology becomes more important. I'm excited to update the SARS-CoV-2 tutorial, expand the installation guide, and reorganize the docs for better discoverability. I'm eager to make the SARS-CoV-2 instructional more up to date, the installation guide longer, and the documents more logically arranged for easier finding.

I'm sure I can complete the project's success metrics, which include having new users test the tutorials and making sure the documentation makes it possible for others to duplicate your techniques without further assistance. I'm excited to help with this important project and enhance viral variant surveillance through wastewater analysis.

Please consider my application. I look forward to discussing my qualifications and approach in more detail.

Best regards,
Raj

@elabongaatuo
Copy link

Hello @DrYak,

Congratulations on being selected as one of the GSOD '24 participants.

As an electrical engineer with a strong interest in technical writing, I'm excited to apply for the V-pipe documentation project in Google Summer of Docs. V-pipe's use of genomics, indexing, and mapping for wastewater monitoring aligns perfectly with my skill set.

I'm passionate about V-pipe's potential to revolutionize pandemic preparedness through wastewater analysis. My technical background and dedication to clear communication make me a valuable asset to your team.

Please find my Statement of Interest and a snippet of code from when I happened to tinker with V-pipe.

Thank you for your time and consideration. Do look forward to working with you.

Regards,

Yvonne Elabonga.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment