Skip to content

Instantly share code, notes, and snippets.

@audy
Created July 2, 2020 18:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save audy/67e64cb64310d08fffbb0297a107ca7f to your computer and use it in GitHub Desktop.
Save audy/67e64cb64310d08fffbb0297a107ca7f to your computer and use it in GitHub Desktop.
check the frequency of the first N bases to try to identify barcode sequences in a non-demultiplexed fastq file
#!/usr/bin/env python3
# usage: cat reads.fastq | ./get-barcodes.py
from Bio import SeqIO
from collections import defaultdict
BARCODE_SIZE = 14
counts = defaultdict(lambda: 0)
with open("/dev/stdin") as handle:
for n, record in enumerate(SeqIO.parse(handle, "fastq")):
counts[str(record.seq)[0:BARCODE_SIZE]] += 1
if n > 100_000:
break
for seq, count in counts.items():
if count > 1000:
print(seq, count)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment