Skip to content

Instantly share code, notes, and snippets.

Tony walterst

Block or report user

Report or block walterst

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@walterst
walterst / parse_ipod_to_metadata.py
Last active Mar 15, 2019
Custom script used to parse tab delimited Ipod data, match up dates from tab-delimited QIIME mapping data, and write averages of data from multiple days on and prior to qiime metadata samples as metadata columns. This script uses a QIIME 1.9X environment for the parse_mapping_file function.
View parse_ipod_to_metadata.py
#!/usr/bin/env python
from __future__ import division
# USAGE: python parse_ipod_to_metadata.py mapping_file days_to_consider ipod_tab_delim_file raw_output_file qiime_compatible_output_file
# where days_to_consider counts the same-day as one of the days, and comma-seperated columns needs to be
# an exact match to the field label in the ipod data file, e.g. Gastrointestinal_issues
# All dates must be in the format of DD/MM/YY in the ipod source tab delimited data.
from sys import argv
from operator import itemgetter
@walterst
walterst / random_subsample_fastq.py
Created Dec 17, 2018
Randomly subsamples a directory of fastq.gz files, writes out subsampled fastq files to output directory
View random_subsample_fastq.py
#!/usr/bin/env
from sys import argv
from random import random
#from gzip import open as gz_open
from glob import glob
import gzip
import os
@walterst
walterst / find_fastq_errors.py
Last active Apr 17, 2018
Very simple fastq parser/checker to try and detect errors. assumes lines will be exactly (@Label, sequence, +, quality scores). Checks for expected chars at label/optional label, equal length of seq/qual.
View find_fastq_errors.py
#!/usr/bin/env python
# Used to find fastq seqs in gzipped files, write first error, if any, to a log file
# Usage: python find_fastq_errors.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it-will search subdirectories
from sys import argv
from glob import glob
import gzip
@walterst
walterst / record_singletons.py
Created Apr 3, 2018
Use to count the number of singletons present in an QIIME OTU mapping file, write these sequence IDs to an output file.
View record_singletons.py
#!/usr/bin/env python
"""Usage: python record_singletons.py X Y
where X is the input OTU mapping file and Y is the output singleton sequence ID file.
"""
from sys import argv
otu_mapping = open(argv[1], "U")
singletons_out = open(argv[2], "w")
@walterst
walterst / parse_otu_mapping_from_uc.py
Created Apr 3, 2018
Parses data from .uc files (tested with vsearch, should work with uclust/usearch too) to create an QIIME 1.X OTU mapping file.
View parse_otu_mapping_from_uc.py
#!/usr/bin/env python
""" This is modified from the bfillings usearch app controller
usage: python parse_otu_mapping_from_uc.py X Y
where X is the input .uc file, Y is the output OTU mapping file"""
from sys import argv
@walterst
walterst / get_rank_sorted_data.py
Created Jan 31, 2018
Generate rank/frequency (and log-transformed) data for OTU counts to match approach described in article listed in script text.
View get_rank_sorted_data.py
#!/usr/bin/env python
from sys import argv
from operator import itemgetter
from scipy.stats import rankdata
from numpy import log
from biom import load_table
@walterst
walterst / filter_barcode_header.py
Last active Nov 16, 2017
Filters a barcode header to remove target characters, e.g. "+" character. Splits on target identifiers.
View filter_barcode_header.py
#!/usr/bin/env python
# Usage: python filter_barcode_header.py original_barcode_seqs.fastq new_barcode_seqs.fastq
# WARNING-the second file specified will be overwritten if it exists!
bc_start_indicator = "1:N:0:"
chars_to_strip = ["+"]
from sys import argv
View count_zipped_fastq_reads.py
#!/usr/bin/env python
# Used to count fastq seqs in gzipped files, write counts and file name to log file
# Usage: python count_zipped_fastq_reads.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it (doesn't search subdirectories)
from sys import argv
from glob import glob
from cogent.parse.fastq import MinimalFastqParser
@walterst
walterst / filter_fastqV2.py
Last active Aug 8, 2017
Filter a fastq file to match target fastq labels, e.g. after stitching reads.
View filter_fastqV2.py
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
from qiime.util import gzip_open
@walterst
walterst / filter_fastq.py
Last active Nov 2, 2017
Filters an input fastq to match labels in target fastq file.
View filter_fastq.py
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
You can’t perform that action at this time.