This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
# | |
# Computes the total number of reads, total read length, and average read | |
# length of a set of (maybe gzipped) FASTA / FASTQ files. Requires the pyfastx | |
# library (https://github.com/lmdu/pyfastx). I designed this in the context of | |
# computing read statistics, but if you have a set of other sequences (e.g. | |
# contigs) then I guess this would still work for that. | |
# | |
# USAGE: | |
# ./read_stats.py file1.fa [file2.fa ...] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# | |
# Shortens edge labels in a DOT file output by LJA to just show the first line | |
# and then a count of how many other lines are omitted. (If an edge's label | |
# spans exactly one or two lines, then the entire label is preserved.) | |
# | |
# USAGE: | |
# ./shorten_edge_labels.py in.dot out.dot | |
import sys |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python | |
# | |
# Scans through a jumboDBG / LJA output DOT file; looks for cases where | |
# the same node is "defined" on multiple lines. This can be caused by the | |
# same truncated node ID being misused across lines. | |
# | |
# USAGE: | |
# ./check_for_conflicting_node_ids.py graph.dot | |
# | |
# Note that this assumes that the input graph was output by jumboDBG / LJA -- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
# | |
# SUMMARY | |
# ======= | |
# Outputs a copy of a GFA 1 file with each segment (S) line that contains a | |
# sequence (not just a "*" character) altered as follows: | |
# | |
# - If an LN:i tag does not exist for this sequence: | |
# - We will add an LN:i tag describing the length of the sequence. | |
# - We will replace the sequence with a "*" character. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
# NOTE: this is a hack, so it will probably break if you have BBL files that | |
# don't look like the natbib-generated ones I'm used to. It is also pretty | |
# unintelligent about *how* it sorts entries (it defers most of the work | |
# to python), so if you have cases where some of your references are by | |
# the same person or whatever then that might cause the output to not match | |
# your expectations. | |
import sys |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
# Converts a GFA assembly graph to a FASTA file of all sequences | |
# within the graph. Notably, this ignores connections between sequences | |
# in the graph. | |
# | |
# Depends on Python 3.6 or later. | |
# | |
# Usage: | |
# $ ./gfa_to_fasta.py mygraph.gfa contigs.fasta |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
import os | |
from collections import Counter | |
from math import ceil | |
import re | |
from numpy import argmax | |
import pandas as pd | |
from qiime2 import Metadata | |
# "Parameters" of this script |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
from dateutil.parser import parse | |
import pandas as pd | |
df = pd.read_csv("20191209_metadata.txt", sep="\t", index_col=0) | |
# Subset to a certain host subject ID, if desired | |
df = df[df["host_subject_id"] == "M03"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
""" | |
This is a small script that looks through the annotated taxonomies of all | |
features present in a dataset's negative control samples. It's handy for | |
checking that certain features are (for the most part) absent from these | |
samples. | |
This obviously isn't a very formal way of accounting for contamination, | |
but it is useful for quickly verifying that certain taxa are probably not | |
the product of contamination. (Better approaches include e.g. the decontam |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env python3 | |
from qiime2 import Metadata | |
from dateutil.parser import parse | |
from dateutil.relativedelta import relativedelta | |
m = Metadata.load("metadata-with-age.tsv") | |
m_df = m.to_dataframe() | |
m_df["ordinal-timestamp"] = 0 |
NewerOlder