Skip to content

Instantly share code, notes, and snippets.

@walterst
walterst / filter_fastq.py
Last active November 2, 2017 11:16
Filters an input fastq to match labels in target fastq file.
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
@walterst
walterst / add_taxa_to_fasta.py
Created January 27, 2017 19:23
Use to append a tab-delimited fasta string to a fasta file
#!/usr/bin/env python
""" Usage:
python add_taxa_to_fasta.py input_taxa_file input_fasta_file output_fasta
"""
from sys import argv
from cogent.parse.fasta import MinimalFastaParser
@walterst
walterst / collapse_rare_taxa.py
Last active December 8, 2016 12:24
Usage: python collapse_rare_tax.py -i otu_table
#!/usr/bin/env python
__author__ = "William Walters"
__copyright__ = "NA"
__credits__ = ["William Walters"]
__license__ = "GPL"
__version__ = "1.0"
__maintainer__ = "William Walters"
__email__ = "william.a.walters@gmail.com"
@walterst
walterst / filter_otu_mapping_from_otu_table.py
Last active March 2, 2017 06:01
(written with QIIME 1.9.1 dependencies in place) Finds the OTU IDs in a supplied OTU table, filters all IDs not matching these in the supplied OTU mapping file to create a filtered OTU mapping file as output. The purpose of this would be to backtrack to unclustered read data but have all reads removed that were filtered along the way.
#!/usr/bin/env python
__author__ = "William Walters"
__copyright__ = "Copyright 2011"
__credits__ = ["William Walters"]
__license__ = "GPL"
__version__ = "1.0"
__maintainer__ = "William Walters"
__email__ = "William.A.Walters@colorado.edu"
@walterst
walterst / strip_primers_exclude.py
Created June 22, 2016 05:38
Searches for forward/reverse primers in supplied QIIME formatted mapping file for target fasta, truncates read inside of primer hit sites, does not write read if primers are not found.
#!/usr/bin/env python
# USAGE: python strip_primers_exclude.py Mapping_file input_fasta output_fasta log_filename
from sys import argv
from string import upper
from re import compile
from cogent.parse.fasta import MinimalFastaParser
from skbio.sequence import DNA
@walterst
walterst / workflow_genus_distances.txt
Last active June 10, 2016 18:15
Description of process and scripts used to count nucleotide differences within target genera
We want to ask the question of how different sequences are within certain genera. In this case, I was looking at Prevotella,
Bacteroides, and Porphyromonas genera within Bacteroidetes, and the distance between sequences are a count of nucleotide differences
divided by the length of the sequence considered.
To do this, I used the 99% OTUs (16S only) from the SILVA 123 release, available here:
http://www.arb-silva.de/no_cache/download/archive/qiime/
We want to minimize the number of sequences included that may erroneously be labeled as the target taxa, but fall on other parts of
the Bacteroidetes tree with other taxa, rather than grouped with the target genus. My goal is to find a node within a Bacteroidetes
tree whose descendents are all or mostly the target genus while retaining the most possible tips that contain the
@walterst
walterst / remove_short_reads.py
Created June 10, 2016 17:02
Specify an input fasta file and minimum length, e.g. python remove_short_reads.py seqs.fna 1300 > trimmed_reads.fna
#!/usr/bin/env python
from sys import argv
from cogent.parse.fasta import MinimalFastaParser
min_len = int(argv[2])
for label,seq in MinimalFastaParser(open(argv[1], "U")):
@walterst
walterst / remove_short_reads.py
Created June 10, 2016 17:02
Specify an input fasta file and minimum length, e.g.
#!/usr/bin/env python
from sys import argv
from cogent.parse.fasta import MinimalFastaParser
min_len = int(argv[2])
for label,seq in MinimalFastaParser(open(argv[1], "U")):
@walterst
walterst / strip_primers_forward_only.py
Last active May 14, 2016 18:19
USAGE: python strip_primers.py Mapping_file input_fasta output_fasta log_filename (modified to only search for forward primers, remove reads where primer isn't found).
#!/usr/bin/env python
# USAGE: python strip_primers.py Mapping_file input_fasta output_fasta log_filename
from sys import argv
from string import upper
from re import compile
from cogent.parse.fasta import MinimalFastaParser
from skbio.sequence import DNA
@walterst
walterst / generate_taxa_scatter_plots.py
Last active March 11, 2016 16:55
Updated taxa scatter plot script. Works for data that are summarized (e.g. qiime's summarize_taxa.py output). Can compare raw OTU level tables, but the taxonomy will be listed as the OTU ids.
#!/usr/bin/env python
__author__ = "William Walters"
__copyright__ = "Copyright 2011"
__credits__ = ["William Walters"]
__license__ = "GPL"
__version__ = "1.0"
__maintainer__ = "William Walters"
__email__ = "William.A.Walters@colorado.edu"