Skip to content

Instantly share code, notes, and snippets.

@walterst
walterst / compare_fastq_labels.py
Last active December 20, 2017 09:03
Script to compare two fastq files, print labels that do not match, and tests for sequence/quality lengths that are different (useful in cases where one suspects index reads and other reads do not have matching labels). Made to work with CASAVA 1.8.0 and later, might fail on earlier versions.Usage: python compare_fastq_labels.py fastq1_fp fastq2_fp
#!/usr/bin/env python
# Modified from Greg Caporaso's code in qiime/split_libraries_fastq.py
# Usage: python compare_fastq_labels.py fastq1_fp fastq2_fp
from itertools import izip
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
@walterst
walterst / reverse_qual_scores.py
Created June 20, 2013 22:41
Reverse a qual score file. May be needed when trying to match up reverse complemented fasta files.
#!/usr/bin/env python
""" Used to reverse a qual score sequence, which may be needed in cases of
paired fasta/qual files.
Requires QIIME installed to use (created with 1.7.0dev)
Usage:
python reverse_qual_scores.py X Y
where X is qual scores filepath, Y is output reversed filepath
@walterst
walterst / parse_bcs_from_fastq_labels.py
Created July 12, 2013 14:28
Parser to pull barcodes from fastq labels and write to a separate barcodes fastq file. See description at beginning of code for usage example. Requires PyCogent 1.5.3 to be installed (http://sourceforge.net/projects/pycogent/files/PyCogent/1.5.3/PyCogent-1.5.3.tgz/download)
#!/usr/bin/env python
# Usage:
# python parse_bcs_from_fastq_labels.py X Y Z A
# where X is input fastq file, Y is output barcode reads file,
# Z is character to split on in label (use quote characters), and A is number of characters to trim from the end of the label (0 for none)
# This assumes barcode is at the end of the label, and the number of characters following it are consistent
""" Example sequence, would use: python parse_bcs_from_fastq_labels.py fastq_fp bc_reads.fastq '#' 2 to generate barcodes
@MCIC-SOLEXA_0051_FC:1:1:14637:1026#CGATGT/1
@walterst
walterst / extract_bcs_from_fastq.py
Last active December 19, 2015 22:39
Usage: python extract_bcs_from_fastq.py X Y Z A B where: X is input fastq file Y is output barcode reads fastq file Z is output reads (with barcode removed) fastq file A is size of barcode B is True/False for reverse complement of barcode before writing
#!/usr/bin/env python
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
from cogent import DNA
f = open(argv[1], "U")
@walterst
walterst / combine_fastq_barcodes.py
Last active December 21, 2015 09:19
This script combines fastq index (barcode) reads, e.g., those created by using the parse_bc_reads_labels.py script. Usage: python combine_fastq_barcodes.py X Y Z where X is the first input fastq barcodes file, Y is the second fastq barcodes file, and Z is the output combined fastq barcodes file. This script assumes these are the raw data, i.e., …
#!/usr/bin/env python
from sys import argv
from itertools import izip
from cogent.parse.fastq import MinimalFastqParser
# Usage: python combine_fastq_barcodes X Y Z
# where X is the first input fastq barcodes file, Y is the second fastq
# barcodes file, and Z is the output combined fastq barcodes file.
@walterst
walterst / parse_bc_from_read_end.py
Last active December 22, 2015 16:18
Requires PyCogent installation. This should be present if you have QIIME installed. If you are using MacQIIME, initialize the environment by calling macqiime before using this script. Usage: python parse_bc_from_read_end.py X Y A B T/F Where: X is the input fasta file Y is the output fasta file A is the length of the forward barcode B is the len…
#!/usr/bin/env python
from sys import argv
from cogent.parse.fasta import MinimalFastaParser
from cogent import DNA
"""
Requires PyCogent installation. This should be present if you have QIIME
installed. If you are using MacQIIME, initialize the environment by calling
#!/usr/bin/env python
""" Usage
python extract_bcs_from_fastq_ends.py X Y Z A B C D E
Where
X: input fastq file with barcodes at the beginning and ends of the reads
Y: output fastq barcodes file
Z: output reads fastq file (with barcodes removed)
A: length of forward barcode (int)
B: length of reverse barcode (int)
@walterst
walterst / truncate_seq_lens.py
Last active March 9, 2016 19:11
Used to truncate reads of lengths out of a given fasta filepython truncate_seq_lens.py X Y Z AwhereX is input fasta fileY is the minimum sequence lengthZ is the maximum sequence lengthA is output fasta file
#!/usr/bin/env python
""" Usage
python truncate_seq_lens.py X Y Z A
where
X is input fasta file
Y is the minimum sequence length (discards reads shorter than this)
Z is the maximum sequence length (discards reads longer than this)
A is target truncation length
B is output fasta file
#!/usr/bin/env python
from sys import argv
from itertools import izip
from cogent.parse.fastq import MinimalFastqParser
""" Usage
python merge_bcs_reads.py X Y Z
X: barcodes fastq file
@walterst
walterst / split_fasta_qual_seqs.py
Created November 20, 2013 03:12
Usage: python split_fasta_qual_seqs.py X Y Z A B C D where X - input fasta file Y - input qual file Z - Number of seqs to write to first file, remaining will be written to second A - first output fasta file B - second output fasta file C - first output qual file D - second output qual file One can use a: grep -c "^>" fasta_filepath to see how ma…
#!/usr/bin/env python
""" Usage:
python split_fasta_qual_seqs.py X Y Z A B C D
where
X - input fasta file
Y - input qual file
Z - Number of seqs to write to first file, remaining will be written to second
A - first output fasta file
B - second output fasta file