Skip to content

Instantly share code, notes, and snippets.

@arq5x
Last active August 29, 2015 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arq5x/aaf1152a79c7ad1cc67e to your computer and use it in GitHub Desktop.
Save arq5x/aaf1152a79c7ad1cc67e to your computer and use it in GitHub Desktop.
report duplicate rates for a directory
DIR=bam/t1d-run2a-redo/
# line 10: run flagstats on marked dup BAM
# line 11: grab the number of duplcates line and the total reads line
# line 12 grab the total and dup total
# line 13: place the total followed by duplicate count on same line
# line 14: print the duplicate fraction
for file in `ls $DIR/*.bwamem.sort.dedup.bam`;
do
samtools flagstat $file | \
grep -B 1 duplicates | \
cut -f 1 -d " " | \
paste - - | \
awk '{print $2 / $1 " ("$2 " of " $1 ")"}'
done
# results (24 files in directory)
0.03714 (14327 of 385757)
0.083087 (91506 of 1101328)
0.0574643 (38073 of 662550)
0.195722 (319601 of 1632934)
0.124546 (156507 of 1256624)
0.163537 (301348 of 1842691)
0.0561464 (36523 of 650496)
0.0881182 (92425 of 1048875)
0.107908 (121355 of 1124618)
0.172086 (353423 of 2053754)
0.149414 (244611 of 1637134)
0.194814 (439776 of 2257419)
0.067853 (59428 of 875835)
0.050496 (28444 of 563292)
0.122581 (96895 of 790459)
0.153212 (271573 of 1772531)
0.168307 (269363 of 1600425)
0.179611 (373077 of 2077144)
0.208015 (474154 of 2279427)
0.120883 (167051 of 1381919)
0.201733 (331600 of 1643754)
0.0642905 (43459 of 675979)
0.0732655 (66193 of 903467)
0.114188 (139545 of 1222065)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment