Skip to content

Instantly share code, notes, and snippets.

@walterst
walterst / parse_ipod_to_metadata.py
Last active Mar 15, 2019
Custom script used to parse tab delimited Ipod data, match up dates from tab-delimited QIIME mapping data, and write averages of data from multiple days on and prior to qiime metadata samples as metadata columns. This script uses a QIIME 1.9X environment for the parse_mapping_file function.
View parse_ipod_to_metadata.py
#!/usr/bin/env python
from __future__ import division
# USAGE: python parse_ipod_to_metadata.py mapping_file days_to_consider ipod_tab_delim_file raw_output_file qiime_compatible_output_file
# where days_to_consider counts the same-day as one of the days, and comma-seperated columns needs to be
# an exact match to the field label in the ipod data file, e.g. Gastrointestinal_issues
# All dates must be in the format of DD/MM/YY in the ipod source tab delimited data.
from sys import argv
from operator import itemgetter
@walterst
walterst / random_subsample_fastq.py
Created Dec 17, 2018
Randomly subsamples a directory of fastq.gz files, writes out subsampled fastq files to output directory
View random_subsample_fastq.py
#!/usr/bin/env
from sys import argv
from random import random
#from gzip import open as gz_open
from glob import glob
import gzip
import os
@walterst
walterst / find_fastq_errors.py
Last active Apr 17, 2018
Very simple fastq parser/checker to try and detect errors. assumes lines will be exactly (@Label, sequence, +, quality scores). Checks for expected chars at label/optional label, equal length of seq/qual.
View find_fastq_errors.py
#!/usr/bin/env python
# Used to find fastq seqs in gzipped files, write first error, if any, to a log file
# Usage: python find_fastq_errors.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it-will search subdirectories
from sys import argv
from glob import glob
import gzip
@walterst
walterst / record_singletons.py
Created Apr 3, 2018
Use to count the number of singletons present in an QIIME OTU mapping file, write these sequence IDs to an output file.
View record_singletons.py
#!/usr/bin/env python
"""Usage: python record_singletons.py X Y
where X is the input OTU mapping file and Y is the output singleton sequence ID file.
"""
from sys import argv
otu_mapping = open(argv[1], "U")
singletons_out = open(argv[2], "w")
@walterst
walterst / parse_otu_mapping_from_uc.py
Created Apr 3, 2018
Parses data from .uc files (tested with vsearch, should work with uclust/usearch too) to create an QIIME 1.X OTU mapping file.
View parse_otu_mapping_from_uc.py
#!/usr/bin/env python
""" This is modified from the bfillings usearch app controller
usage: python parse_otu_mapping_from_uc.py X Y
where X is the input .uc file, Y is the output OTU mapping file"""
from sys import argv
@walterst
walterst / get_rank_sorted_data.py
Created Jan 31, 2018
Generate rank/frequency (and log-transformed) data for OTU counts to match approach described in article listed in script text.
View get_rank_sorted_data.py
#!/usr/bin/env python
from sys import argv
from operator import itemgetter
from scipy.stats import rankdata
from numpy import log
from biom import load_table
@walterst
walterst / filter_barcode_header.py
Last active Nov 16, 2017
Filters a barcode header to remove target characters, e.g. "+" character. Splits on target identifiers.
View filter_barcode_header.py
#!/usr/bin/env python
# Usage: python filter_barcode_header.py original_barcode_seqs.fastq new_barcode_seqs.fastq
# WARNING-the second file specified will be overwritten if it exists!
bc_start_indicator = "1:N:0:"
chars_to_strip = ["+"]
from sys import argv
View count_zipped_fastq_reads.py
#!/usr/bin/env python
# Used to count fastq seqs in gzipped files, write counts and file name to log file
# Usage: python count_zipped_fastq_reads.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it (doesn't search subdirectories)
from sys import argv
from glob import glob
from cogent.parse.fastq import MinimalFastqParser
@walterst
walterst / filter_fastqV2.py
Last active Aug 8, 2017
Filter a fastq file to match target fastq labels, e.g. after stitching reads.
View filter_fastqV2.py
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
from qiime.util import gzip_open
@walterst
walterst / filter_fastq.py
Last active Nov 2, 2017
Filters an input fastq to match labels in target fastq file.
View filter_fastq.py
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser