Skip to content

Instantly share code, notes, and snippets.

@pjbriggs
Created January 24, 2013 11:20
Show Gist options
  • Save pjbriggs/4620233 to your computer and use it in GitHub Desktop.
Save pjbriggs/4620233 to your computer and use it in GitHub Desktop.
Count and report the index sequences (aka barcode tags) in an Illumina FASTQ file
#!/bin/env python
#
# Count and report the barcode tags in a fastq file from Illumina
# sequencing platform
#
import sys
import FASTQFile
fastq_file = sys.argv[1]
tags = {}
for read in FASTQFile.FastqIterator(fastq_file):
tag = read.seqid.index_sequence
if tag not in tags:
tags[tag] = 1
else:
tags[tag] += 1
# Sort tags into order, most to least frequent
ordered_tags = sorted(tags,cmp=lambda x,y: cmp(tags[y],tags[x]))
print "Total # unique barcode sequences: %d" % len(ordered_tags)
for tag in ordered_tags:
print "%s\t%d" % (tag,tags[tag])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment