Skip to content

Instantly share code, notes, and snippets.

View fedarko's full-sized avatar

Marcus Fedarko fedarko

View GitHub Profile
@fedarko
fedarko / poisson-cat.py
Created April 17, 2019 22:18 — forked from mortonjt/poisson-cat.py
Closed form categorical Poisson regression
import numpy as np
import patsy
import pandas as pd
from biom import Table
# This is the main function
def poisson_cat(table, metadata, category, ref=None):
""" Poisson differential abundance.
Parameters
@fedarko
fedarko / align_q2_table_and_metadata.py
Last active July 25, 2019 05:51
Filter a QIIME 2 feature table and a metadata TSV file to shared samples
#! /usr/bin/env python3
import biom
from qiime2 import Artifact, Metadata
YOUR_QZA_TABLE_FILEPATH_GOES_HERE = "table.qza"
YOUR_METADATA_FILEPATH_GOES_HERE = "metadata.tsv"
# Load the table
tbl_qza = Artifact.load(YOUR_QZA_TABLE_FILEPATH_GOES_HERE)
t = tbl_qza.view(biom.Table)
@fedarko
fedarko / validate_collection_date_and_timestamp.py
Last active July 25, 2019 21:05
In a tab-separated metadata file, validate that all collection_timestamp values in a metadata file start with that row's collection_date value
#! /usr/bin/env python3
# note that this is very preliminary + untested code
# We use pd.read_csv() because, unlike QIIME 2's Metadata object, it allows
# duplicate sample IDs.
import pandas as pd
import sys
if len(sys.argv) < 2:
raise ValueError("You need to specify a metadata file to check.")
df = pd.read_csv(sys.argv[1], sep='\t', index_col=0)
@fedarko
fedarko / validate_sample_ids_and_timestamps.py
Last active August 21, 2019 22:49
Compares "Qiita-style" sample IDs containing dates with the collection_timestamp dates in a metadata file.
#! /usr/bin/env python3
import re
import pandas as pd
from dateutil.parser import parse
m = pd.read_csv("metadata.tsv", sep='\t', index_col=0)
# Find all sample IDs in the metadata that include a "date"
# We assume that sample IDs that start with a 5-digit Qiita ID, then a period,
# then a two-character host ID string, then another period, will follow this
# convention.
@fedarko
fedarko / add_age_column_to_metadata.py
Last active August 30, 2019 06:25
Adds an "age in years" column to a QIIME 2 sample metadata file
#! /usr/bin/env python3
from qiime2 import Metadata
from dateutil.parser import parse
from dateutil.relativedelta import relativedelta
subject_id = "HOST SUBJECT ID"
subject_birthday = "HOST BIRTHDAY"
subject_birthday_datetime = parse(subject_birthday)
age_col_name = "subject_age_years"
@fedarko
fedarko / split_metadata_by_run.py
Last active September 26, 2019 22:27
Splits up a QIIME 2 metadata file into separate metadata files, such that there is one file per specified "run" column. This is useful if multiple samples from different runs share barcode sequences, which can make QIIME 2 angry.
# NOTE: Assumes that there's a SAMPLE_METADATA environment variable declared pointing to a metadata file
# NOTE: Assumes that this metadata file contains BarcodeSequence and seq_run_ord columns
import pandas as pd
import os
md = pd.read_csv(os.environ["SAMPLE_METADATA"], sep="\t", index_col=0)
print("There are {} unique barcode sequences in this metadata file.".format(len(md["BarcodeSequence"].unique())))
runs = tuple(md["seq_run_ord"].unique())
print("Also, the {} runs listed in this metadata file are {}.".format(len(runs), runs))
@fedarko
fedarko / gh_url_to_raw_gh_url.py
Created October 2, 2019 22:10
Convert a github file URL to a raw.githubusercontent.com URL (that can be directly accessed for things like view.qiime2.org or wget)
# your link goes here
link = "https://github.com/knightlab-analyses/qurro-mackerel-analysis/blob/master/AnalysisOutput/qurro-plot.qzv"
# note: this will break if a repo/organization or subfolder is named "blob" -- would be ideal to use a fancy regex
# to be more precise here
print(link.replace("github.com", "raw.githubusercontent.com").replace("/blob/", "/"))
# example output link:
# https://raw.githubusercontent.com/knightlab-analyses/qurro-mackerel-analysis/master/AnalysisOutput/qurro-plot.qzv
@fedarko
fedarko / convert_timestamp_to_days_elapsed.py
Created October 5, 2019 01:02
For a QIIME 2-formatted sample metadata file, uses the collection_timestamp field to assign samples a "days since first day" field. This field is useful for visualizations like q2-longitudinal's volatility plots.
#! /usr/bin/env python3
from qiime2 import Metadata
from dateutil.parser import parse
m = Metadata.load("metadata-with-age-and-ordinal-timestamp.tsv")
m_df = m.to_dataframe()
# Compute earliest date
min_date = None
for sample_id in m_df.index:
@fedarko
fedarko / convert_timestamp_to_ordinal_date.py
Created October 8, 2019 22:26
adds an ordinal date field based on a timestamp to a q2 metadata file
#! /usr/bin/env python3
from qiime2 import Metadata
from dateutil.parser import parse
from dateutil.relativedelta import relativedelta
m = Metadata.load("metadata-with-age.tsv")
m_df = m.to_dataframe()
m_df["ordinal-timestamp"] = 0
@fedarko
fedarko / negative_control_stats.py
Last active October 27, 2019 23:55
Search taxonomies of negative controls
#! /usr/bin/env python3
"""
This is a small script that looks through the annotated taxonomies of all
features present in a dataset's negative control samples. It's handy for
checking that certain features are (for the most part) absent from these
samples.
This obviously isn't a very formal way of accounting for contamination,
but it is useful for quickly verifying that certain taxa are probably not
the product of contamination. (Better approaches include e.g. the decontam