Skip to content

Instantly share code, notes, and snippets.

View huddlej's full-sized avatar

John Huddleston huddlej

View GitHub Profile
@huddlej
huddlej / calculate_coverage_per_position.py.ipynb
Created March 25, 2024 20:46
Simple example of how to calculate coverage per position of a multiple sequence alignment
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@huddlej
huddlej / builds_by_host.yaml
Created March 7, 2024 18:23
Example subsampling logic for multi-species seasonal flu build
custom_rules:
- profiles/gisaid/prepare_data.smk
metadata_fields:
- Isolate_Name
- Isolate_Id
- Passage_History
- Location
- Authors
- Originating_Lab
@huddlej
huddlej / 01-current-implementation.yaml
Last active February 27, 2024 18:34
Example subsampling configurations for public Nextstrain seasonal flu builds with different implementations. See the original configuration file for more context: https://github.com/nextstrain/seasonal-flu/blob/45bf4336d9485c1c9bfc44b09b384595d7685032/profiles/nextstrain-public.yaml#L78-L86
# Current implementation approach where "subsamples" is defined in build configuration YAML file.
# Build configuration parameters get passed to the optionally-templated subsample "filters" strings
# such that the same subsampling scheme can be shared across multiple builds by passing build-specific
# variables. For the full context of this subsampling scheme, see the original build configuration file:
# https://github.com/nextstrain/seasonal-flu/blob/45bf4336d9485c1c9bfc44b09b384595d7685032/profiles/nextstrain-public.yaml#L78-L86
subsamples: &subsampling-scheme
regions_except_europe:
filters: --query "(passage_category != 'egg') & (region != 'Europe') & (ha == True) & (na == True)" --group-by region year month --subsample-max-sequences 2700 --min-date {min_date} --exclude {exclude} --exclude-where passage=egg
# Note that a priority of "titers" has a special meaning in the flu
# workflow which is not portable to other pathogen workflows.
@huddlej
huddlej / tree.nwk
Created June 22, 2023 17:57
Simple ncov tree
(Japan/PG-240490/2022:0.00016908,((((((((((((((Ireland/C-Enfer-COV170122075_H10/2022:0.00006838,England/MILK-33EEE7B/2022:0.00006689):0.00000100,((Argentina/INEI118661/2022:0.00013528,(USA/GA-CDC-STM-QN5JEP37F/2022:0.00003344,Slovakia/BA_22_00007824/2022:0.00017280):0.00003344):0.00000100,(Peru/PIU-INS-16129/2022:0.00010259,Brazil/SP-NVBS14337GENOV828793225901/2022:0.00003344):0.00003344):0.00000100):0.00000100,(USA/CA-CDC-ASC210857596/2022:0.00013530,Brazil/GO-2-LACEN-1527/2022:0.00017074):0.00013524):0.00000203,(((((Canada/BC-BCCDC-334228/2022:0.00003344,Peru/ARE-INS-16519/2022:0.00006854):0.00000100,Brazil/RS-FIOCRUZ-10545/2022:0.00023678):0.00003344,Japan/PG-188383/2022:0.00010147):0.00000100,Poland/22SNR578_wsserze/2022:0.00006940):0.00003231,Brazil/CE-FIOCRUZ-79004CE/2022:0.00010111):0.00003344):0.00054049,Wuhan/Hu-1/2019:0.00116797):0.00082379,Belgium/AZDelta-2238-06015/2022:0.00003344):0.00003344,((((((Slovakia/BA_22_00035301/2022:0.00013530,Denmark/DCGC-529344/2022:0.00003344):0.00000100,((Slovenia/5
@huddlej
huddlej / find_polytomies.py
Created June 21, 2023 23:29
Simplified script to find "polytomies" of a minimum size in a given Newick tree
#!/usr/bin/env python3
import argparse
import Bio.Phylo
if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Find polytomies in a given tree",
formatter_class=argparse.ArgumentDefaultsHelpFormatter
)
@huddlej
huddlej / 2022-03-10-plot-auspice-tree-from-nextstrain-groups-with-baltic.ipynb
Created March 10, 2022 19:08
Example of how to plot Auspice trees from Nextstrain Groups with BALTIC
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@huddlej
huddlej / README.md
Created January 13, 2022 23:09
Example of a composable config for Nextstrain seasonal flu builds using CUE definitions

Example of a composable config for Nextstrain seasonal flu builds using CUE definitions

Background

An example of how one might use a configuration language like CUE to abstract a Nextstrain build configuration across multiple resolutions, segments, or lineages. The attached CUE config defines 20 builds for H3N2 and H1N1pdm lineages, HA and NA segments, and five different resolutions. Resolution- and lineage-specific parameters are defined once as CUE definitions and combined through CUE structs. These 100 lines of CUE (with comments) evaluate to 181 lines of YAML. CUE also allows us to define default values for fields like filter.sequences_per_group that we can override in specific builds. This approach allows us to create builds that require exceptions to the rules like the h3n2_na_6m build.

This example builds on the idea of

@huddlej
huddlej / 2022-01-05-plotting-nextstrain-trees-with-baltic.py.ipynb
Created January 10, 2022 17:20
Examples of how to plot Nextstrain trees
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@huddlej
huddlej / seasonal-flu-builds.yaml
Created December 22, 2021 00:14
Example Nextstrain builds definition for seasonal flu
# Define lineages for the analysis.
lineages:
- h3n2
- h1n1pdm
- vic
- yam
# Define genes to translate from nucleotide to amino acid sequences. These names
# must match coding regions defined in the reference.
genes:
@huddlej
huddlej / README.md
Last active November 16, 2021 06:25
Script to convert Augur's node data JSON files to data frame format (TSV, CSV, etc.)

Convert Augur node data JSON to data frame format (TSV, CSV, etc.)

This script addresses a use case of how to parse values from Augur's node data JSON files into a data frame format that can be easily consumed by other tools.

The following example shows how to convert a discrete trait analysis output from augur traits in the Nextstrain ncov workflow to a TSV file.

python3 node_data_to_table.py \
  --tree results/europe/tree.nwk \
 --jsons results/europe/traits.json \