Skip to content

Instantly share code, notes, and snippets.

@walterst
walterst / compare_fastq_labels.py
Last active December 20, 2017 09:03
Script to compare two fastq files, print labels that do not match, and tests for sequence/quality lengths that are different (useful in cases where one suspects index reads and other reads do not have matching labels). Made to work with CASAVA 1.8.0 and later, might fail on earlier versions.Usage: python compare_fastq_labels.py fastq1_fp fastq2_fp
#!/usr/bin/env python
# Modified from Greg Caporaso's code in qiime/split_libraries_fastq.py
# Usage: python compare_fastq_labels.py fastq1_fp fastq2_fp
from itertools import izip
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
@walterst
walterst / add_taxa_to_fasta.py
Created January 27, 2017 19:23
Use to append a tab-delimited fasta string to a fasta file
#!/usr/bin/env python
""" Usage:
python add_taxa_to_fasta.py input_taxa_file input_fasta_file output_fasta
"""
from sys import argv
from cogent.parse.fasta import MinimalFastaParser
@walterst
walterst / create_majority_taxonomy.py
Created July 8, 2015 18:43
See help text below about usage. Script was created to create 90% majority taxonomy strings for all sequence taxa strings in the Silva 119 release.
#!/usr/bin/env python
# USAGE
# python create_majority_taxonomy.py X Y Z A
# where X is the taxonomy mapping file for all NR seqs, Y is the representative
# file (i.e. one of the rep_set/ files with the 119 release), Z is the OTU
# mapping file created from running pick_otus.py, and A is the output
# consensus mapping file
from sys import argv
@walterst
walterst / create_consensus_taxonomy.py
Created July 8, 2015 18:41
See help text below about usage. Script was created to create consensus taxonomy strings for all sequence taxa strings in the Silva 119 release.
#!/usr/bin/env python
# USAGE
# python create_consensus_taxonomy.py X Y Z A
# where X is the taxonomy mapping file for all NR seqs, Y is the representative
# file (i.e. one of the rep_set/ files with the 119 release), Z is the OTU
# mapping file created from running pick_otus.py, and A is the output
# consensus mapping file
from sys import argv
@walterst
walterst / filter_barcode_header.py
Last active November 16, 2017 22:28
Filters a barcode header to remove target characters, e.g. "+" character. Splits on target identifiers.
#!/usr/bin/env python
# Usage: python filter_barcode_header.py original_barcode_seqs.fastq new_barcode_seqs.fastq
# WARNING-the second file specified will be overwritten if it exists!
bc_start_indicator = "1:N:0:"
chars_to_strip = ["+"]
from sys import argv
@walterst
walterst / filter_fastq.py
Last active November 2, 2017 11:16
Filters an input fastq to match labels in target fastq file.
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
#!/usr/bin/env python
# Used to count fastq seqs in gzipped files, write counts and file name to log file
# Usage: python count_zipped_fastq_reads.py fastq_folder log_file
# where fastq_folder has all of the fastq files in it (doesn't search subdirectories)
from sys import argv
from glob import glob
from cogent.parse.fastq import MinimalFastqParser
@walterst
walterst / filter_fastqV2.py
Last active August 8, 2017 09:26
Filter a fastq file to match target fastq labels, e.g. after stitching reads.
#!/usr/bin/env python
# Used to filter a fastq to match another fastq that is a subset of the query one, e.g. matching a
# index fastq to the pear assembled subset fastq
# Usage: python filter_fastq.py input_fastq target_fastq output_fastq
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
from qiime.util import gzip_open
@walterst
walterst / filter_otu_mapping_from_otu_table.py
Last active March 2, 2017 06:01
(written with QIIME 1.9.1 dependencies in place) Finds the OTU IDs in a supplied OTU table, filters all IDs not matching these in the supplied OTU mapping file to create a filtered OTU mapping file as output. The purpose of this would be to backtrack to unclustered read data but have all reads removed that were filtered along the way.
#!/usr/bin/env python
__author__ = "William Walters"
__copyright__ = "Copyright 2011"
__credits__ = ["William Walters"]
__license__ = "GPL"
__version__ = "1.0"
__maintainer__ = "William Walters"
__email__ = "William.A.Walters@colorado.edu"
@walterst
walterst / split_taxonomy_by_domain.py
Last active February 4, 2017 12:53
See usage string at beginning of script text.
#!/usr/bin/env python
# Usage:
# python split_taxonomy_by_domain.py A X Y Z
# where A is the original raw SILVA taxonomy (tab separated seq ID<tab>semicolon separated taxonomy
# X is the input taxonomy mapping file (e.g. consensus/majority, or other parsed taxonomy file)
# Y is the output 16S mapping file, Z is the output 18S mapping file.
from sys import argv