Skip to content

Instantly share code, notes, and snippets.

@walterst
walterst / MME_R_script_growth_modeling.txt
Last active November 15, 2023 14:58
This is an R script for fitting and plotting infants' growth (weight and height) from ages 0-3 with a modified Michaelis-Menten equation.
# This code will read in the STARR heights and weight data that accompanied the article:
# "A modified Michaelis-Menten equation estimates growth from birth to 3 years in healthy babies in the US"
# The filepaths will need to be modified for the correct local filepath. dplyr and ggplot2, gplots, & gridExtra graphics
# libraries are needed. Interpolation of weight/heights from a given age in days
# would be done through the predict() function, passing the fitted model and a dataframe of days.
# Subjects that fail to fit due to errors with nls() will be plotted as raw data, if errors occur.
# Increase the default number_of_subjects_to_fit to 100 to see an example.
library(dplyr)
library(ggplot2)
@walterst
walterst / strip_primers_fastq.py
Created July 16, 2015 11:51
See USAGE text below. The purpose of the script is to find forward/reverse primers in an input fastq file, and remove everything before/after these primers.
#!/usr/bin/env python
# USAGE: strip_primers_fastq.py Mapping_file input_fasta output_fasta log_filename
from sys import argv
from string import upper
from re import compile
from skbio.parse.sequences import parse_fastq
from skbio.sequence import DNA
#!/usr/bin/env python
from sys import argv
from itertools import izip
from cogent.parse.fastq import MinimalFastqParser
""" Usage
python merge_bcs_reads.py X Y Z
X: barcodes fastq file
@walterst
walterst / parse_nonstandard_chars.py
Last active February 10, 2022 10:21
Usage: python parse_nonstandard_chars.py X > Y where X is the input file to be parsed, and Y is the output parsed file
#!/usr/bin/env python
"""Somewhat hackish way to eliminate non-ASCII characters in a text file,
such as a taxonomy mapping file, with QIIME. Reads through the file, and
removes all characters above decimal value 127. Additionally, asterisk "*"
characters are removed, as these inhibit the RDP classifier.
Usage:
python parse_nonstandard_chars.py X > Y
where X is the input file to be parsed, and Y is the output parsed file"""
@walterst
walterst / parse_otu_mapping_from_uc.py
Created April 3, 2018 08:02
Parses data from .uc files (tested with vsearch, should work with uclust/usearch too) to create an QIIME 1.X OTU mapping file.
#!/usr/bin/env python
""" This is modified from the bfillings usearch app controller
usage: python parse_otu_mapping_from_uc.py X Y
where X is the input .uc file, Y is the output OTU mapping file"""
from sys import argv
@walterst
walterst / parse_ipod_to_metadata.py
Last active March 15, 2019 10:11
Custom script used to parse tab delimited Ipod data, match up dates from tab-delimited QIIME mapping data, and write averages of data from multiple days on and prior to qiime metadata samples as metadata columns. This script uses a QIIME 1.9X environment for the parse_mapping_file function.
#!/usr/bin/env python
from __future__ import division
# USAGE: python parse_ipod_to_metadata.py mapping_file days_to_consider ipod_tab_delim_file raw_output_file qiime_compatible_output_file
# where days_to_consider counts the same-day as one of the days, and comma-seperated columns needs to be
# an exact match to the field label in the ipod data file, e.g. Gastrointestinal_issues
# All dates must be in the format of DD/MM/YY in the ipod source tab delimited data.
from sys import argv
from operator import itemgetter
@walterst
walterst / random_subsample_fastq.py
Created December 17, 2018 16:08
Randomly subsamples a directory of fastq.gz files, writes out subsampled fastq files to output directory
#!/usr/bin/env
from sys import argv
from random import random
#from gzip import open as gz_open
from glob import glob
import gzip
import os
@walterst
walterst / find_fastq_errors.py
Last active April 17, 2018 12:57
Very simple fastq parser/checker to try and detect errors. assumes lines will be exactly (@Label, sequence, +, quality scores). Checks for expected chars at label/optional label, equal length of seq/qual.
#!/usr/bin/env python
# Used to find fastq seqs in gzipped files, write first error, if any, to a log file
# Usage: python find_fastq_errors.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it-will search subdirectories
from sys import argv
from glob import glob
import gzip
@walterst
walterst / record_singletons.py
Created April 3, 2018 08:59
Use to count the number of singletons present in an QIIME OTU mapping file, write these sequence IDs to an output file.
#!/usr/bin/env python
"""Usage: python record_singletons.py X Y
where X is the input OTU mapping file and Y is the output singleton sequence ID file.
"""
from sys import argv
otu_mapping = open(argv[1], "U")
singletons_out = open(argv[2], "w")
@walterst
walterst / get_rank_sorted_data.py
Created January 31, 2018 13:10
Generate rank/frequency (and log-transformed) data for OTU counts to match approach described in article listed in script text.
#!/usr/bin/env python
from sys import argv
from operator import itemgetter
from scipy.stats import rankdata
from numpy import log
from biom import load_table