Skip to content

Instantly share code, notes, and snippets.

@ohofmann
Last active February 27, 2016 10:14
Show Gist options
  • Save ohofmann/8a1d1b2c77db461b5330 to your computer and use it in GitHub Desktop.
Save ohofmann/8a1d1b2c77db461b5330 to your computer and use it in GitHub Desktop.
Illumina X Duplication Check

We are in the process of troubleshooting a relatively new Illumina X10 installation with different libraries, including Genome in a Bottle NA12878 samples. Preliminary results look comparable to Illumina's Platinum Genome data as well as data from other X10 facilities:

However, we still have to get a handle on our duplication rates. The Xs (and the 4000s) are known to struggle with higher duplication rates due to 'underloaded' flowcells which causes molecules to migrate to nearby empty nanowells which in turn results in an increased optical duplication rate. But even with that caveat duplication rates should be in the 15-20% range whereas ours tend to be quite a bit higher:

> summary(dup)
       V1       
 Min.   : 2.32  
 1st Qu.:25.32  
 Median :30.09  
 Mean   :28.35  
 3rd Qu.:31.30  
 Max.   :39.22  

I was particularly interested in the variation we are seeing.

http://gatkforums.broadinstitute.org/gatk/discussion/6747/how-to-mark-duplicates-with-markduplicates-or-markduplicateswithmatecigar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment