Skip to content

Instantly share code, notes, and snippets.

@mohammedkhalfan
Last active January 20, 2022 17:18
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mohammedkhalfan/f2ed9e3455911a302fb6410a499e35b9 to your computer and use it in GitHub Desktop.
Save mohammedkhalfan/f2ed9e3455911a302fb6410a499e35b9 to your computer and use it in GitHub Desktop.
Takes a demultiplexed fastq file as input and returns sorted list of barcodes found in ascending order of frequency.
## Usage: python3 count-barcode-freq.py <fastq_file.gz>
## Example: python3 count-barcode-freq.py sample.fastq.gz
from operator import itemgetter
import sys, gzip
barcodes = {}
with gzip.open(sys.argv[1]) as fastq:
for line in fastq:
if not line.startswith(b'@'): continue
bc = line.decode("utf-8").split(':')[-1].strip()
if bc not in barcodes:
barcodes[bc] = 1
else:
barcodes[bc]+=1
total = sum(barcodes.values())
for k, v in sorted(barcodes.items(), key=itemgetter(1)):
print(k, v, round(v/total*100, 2))
@jordanirvin76
Copy link

How do I import a Fastq file for this script?

@mohammedkhalfan
Copy link
Author

How do I import a Fastq file for this script?

Simply pass the fastq file as an argument:
python3 count-barcode-freq.py sample.fastq.gz

I've updated the script to include a usage example.

@ReneKat
Copy link

ReneKat commented Jan 20, 2022

Hello Dr. Khalfan,
Thank you for making your code public on GitHub. I have run this python script on one of my files and am getting the output:
15 849464 100.0

The first four lines of my fastq.gz file is:

@FS10000408:4:BNT40310-1714:1:1101:1050:1000 1:N:0:15
GGTTTGCTCTGGTTATTGAAACTTCTTGACTGTGTTCTCTTGATTTTCCCCGGTTTGATAGTTTAGCCGGCTTTGCTTCATTCTTCAGCGAAGTGGCAAATCTAGCCAATAACAAAAAAGTCAAGGAGGTGGTTTTCTACTGGAAGTAC
+
,,FFF,FFFFFFFFFFF,FFFFF:FFF,F:F:F,FFFFFFF,F:FFFF,F,,:FF,:,FF,F:,F:FFF::FFFF:F,:FFFFF,F,,::FF:FF,FF:,FFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF

I was expecting an oligonucleotide not a numeral as the barcode... could you give me some guidance? These reads were made on an Illumina iSeq machine using Nextera adapters.

Thank you for your time!
Best Regards,
René

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment