Skip to content

Instantly share code, notes, and snippets.

Created July 12, 2013 14:28
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
What would you like to do?
Parser to pull barcodes from fastq labels and write to a separate barcodes fastq file. See description at beginning of code for usage example. Requires PyCogent 1.5.3 to be installed (
#!/usr/bin/env python
# Usage:
# python X Y Z A
# where X is input fastq file, Y is output barcode reads file,
# Z is character to split on in label (use quote characters), and A is number of characters to trim from the end of the label (0 for none)
# This assumes barcode is at the end of the label, and the number of characters following it are consistent
""" Example sequence, would use: python fastq_fp bc_reads.fastq '#' 2 to generate barcodes
from sys import argv
from cogent.parse.fastq import MinimalFastqParser
f = open(argv[1], "U")
bc_out = open(argv[2], "w")
char_to_split = argv[3]
chars_to_trim = int(argv[4])
for data in MinimalFastqParser(f, strict=False):
# Read in current label
curr_label = data[0].strip()
# Cut off last part of line past ":" character, replace if different character used
curr_bc_read = data[0].strip().split(char_to_split)[-1][0:-chars_to_trim]
# Create fake quality score since not going to get real data, match length of barcode
curr_bc_qual = "F"*len(curr_bc_read)
bc_out.write("@%s\n" % curr_label)
bc_out.write("%s\n" % curr_bc_read)
bc_out.write("+%s\n" % curr_label)
bc_out.write("%s\n" % curr_bc_qual)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment