-
-
Save mohammedkhalfan/f2ed9e3455911a302fb6410a499e35b9 to your computer and use it in GitHub Desktop.
## Usage: python3 count-barcode-freq.py <fastq_file.gz> | |
## Example: python3 count-barcode-freq.py sample.fastq.gz | |
from operator import itemgetter | |
import sys, gzip | |
barcodes = {} | |
with gzip.open(sys.argv[1]) as fastq: | |
for line in fastq: | |
if not line.startswith(b'@'): continue | |
bc = line.decode("utf-8").split(':')[-1].strip() | |
if bc not in barcodes: | |
barcodes[bc] = 1 | |
else: | |
barcodes[bc]+=1 | |
total = sum(barcodes.values()) | |
for k, v in sorted(barcodes.items(), key=itemgetter(1)): | |
print(k, v, round(v/total*100, 2)) | |
Hello Dr. Khalfan,
Thank you for making your code public on GitHub. I have run this python script on one of my files and am getting the output:
15 849464 100.0
The first four lines of my fastq.gz file is:
@FS10000408:4:BNT40310-1714:1:1101:1050:1000 1:N:0:15
GGTTTGCTCTGGTTATTGAAACTTCTTGACTGTGTTCTCTTGATTTTCCCCGGTTTGATAGTTTAGCCGGCTTTGCTTCATTCTTCAGCGAAGTGGCAAATCTAGCCAATAACAAAAAAGTCAAGGAGGTGGTTTTCTACTGGAAGTAC
+
,,FFF,FFFFFFFFFFF,FFFFF:FFF,F:F:F,FFFFFFF,F:FFFF,F,,:FF,:,FF,F:,F:FFF::FFFF:F,:FFFFF,F,,::FF:FF,FF:,FFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFF
I was expecting an oligonucleotide not a numeral as the barcode... could you give me some guidance? These reads were made on an Illumina iSeq machine using Nextera adapters.
Thank you for your time!
Best Regards,
René
Simply pass the fastq file as an argument:
python3 count-barcode-freq.py sample.fastq.gz
I've updated the script to include a usage example.