Skip to content

Instantly share code, notes, and snippets.

@cmdcolin
Last active November 23, 2023 02:43
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cmdcolin/9f677eca28448d8a7c5d6e9917fc56af to your computer and use it in GitHub Desktop.
Save cmdcolin/9f677eca28448d8a7c5d6e9917fc56af to your computer and use it in GitHub Desktop.
coverage comparisons

Coverage tests for different bioinformatics tools

this document looks at the depth at a particular position in a BAM file a user sent me, at position 14712.

I discuss in the comments what settings might be related to the coverage differents

High level summary

The major variables between tools and the scores they report for coverage are related to the filters that are by default applied. Tools that report a lower score generally include duplicate and qc vendor fail filters by default. In this test at a particular position, these tools reported 74 reads.

Scores that are higher have no filters applied. In this test at a particular position, these tools reported 423.

The tool mospdeth can report scores that are different from most other ones (Scores 52 and 282) because it is counting overlapping pairs as a single read

I look at a particular position, 14712 (in 1-based coordinates) in a SARS-CoV2 BAM file (no link to it here, sorry)

Per-tool summary

Samtools

samtools depth --reference seq.fa CV-2_downsampled.bam

SARS-CoV2 14712 74 (the outputted data is 1-based coordinate so matches 14711)

Bedtools

bedtools genomecov -bg -ibam CV-2_downsampled.bam > out.bg 

SARS-CoV2 14711 14715 423

Mosdepth

# note: non --fast-mode mosdepth accounts pair overlap as a single read, explaining the uniqueness amongst results
mosdepth -F 0 CV-2_downsampled.cov.f0 CV-2_downsampled.bam

SARS-CoV2 14711 14718 282

mosdepth -F 0 --fast-mode CV-2_downsampled.cov.f0 CV-2_downsampled.bam

SARS-CoV2 14711 14718 423

mosdepth --fast-mode CV-2_downsampled.cov.f0 CV-2_downsampled.bam

SARS-CoV2 14711 14713 74

# note: non --fast-mode mosdepth accounts pair overlap as a single read, explaining the uniqueness amongst results
mosdepth CV-2_downsampled.cov.f0 CV-2_downsampled.bam 

SARS-CoV2 14711 14718 52

deepTools bamCoverage

bamCoverage --binSize 1 -b CV-2_downsampled.bam  -o  CV-2_downsampled.cov.bamcoverage.bin

SARS-CoV2 14711 14715 423

GATK

gatk DepthOfCoverage  --reference seq.fa --input CV-2_downsampled.bam --intervals regions.bed --output gatk.scores

SARS-CoV2 14712 74 (the outputted data is 1-based coordinates so matches 14711)

IGV

IGV "computes" coverage visually as a track

This is IGV with downsampling off, non default settings filter duplicate reads and filter vendor QC reads turned off

SARS-CoV2 14712 423 (1-based coordinate so it is same as 14711 in bedgraph)

Screenshot from 2020-04-28 10-31-00

IGV with downsampling off, but the filter duplicate reads and filter vendor QC on

SARS-CoV2 14712 74 (1-based coordinate so it is same as 14711 in bedgraph)

Screenshot from 2020-04-28 10-34-26

JBrowse 1

JBrowse, which by default filters QC fail, duplicate, secondary turned on

SARS-CoV2 14472 74 (1-based coordinate so it is same as 14711 in bedgraph)

localhost_jbrowse__data=covid loc=SARS-CoV2%3A14653 14922 tracks=DNA%2CCV-2_downsampled cram highlight=

JBrowse with default filters, QC fail, secondary filters turned off

SARS-CoV2 14472 423 (1-based coordinate so it is same as 14711 in bedgraph)

@cmdcolin
Copy link
Author

cmdcolin commented Apr 28, 2020

JBrowse, which by default filters QC fail, duplicate, secondary turned on

SARS-CoV2 14472 74 (1-based coordinate so it is same as 14711 in bedgraph)

localhost_jbrowse__data=covid loc=SARS-CoV2%3A14653 14922 tracks=DNA%2CCV-2_downsampled cram highlight=

JBrowse with default filters, QC fail, secondary filters turned off

SARS-CoV2 14472 423 (1-based coordinate so it is same as 14711 in bedgraph)

@cmdcolin
Copy link
Author

cmdcolin commented Apr 28, 2020

IGV with downsampling off, non default settings filter duplicate reads and filter vendor QC reads turned off

SARS-CoV2 14712 423 (1-based coordinate so it is same as 14711 in bedgraph)

Screenshot from 2020-04-28 10-31-00

IGV with downsampling off, but the filter duplicate reads and filter vendor QC on

SARS-CoV2 14712 74 (1-based coordinate so it is same as 14711 in bedgraph)

Screenshot from 2020-04-28 10-34-26

@cmdcolin
Copy link
Author

cmdcolin commented Apr 28, 2020

After all this, the major variables are the filters that are by default applied. Things that are score 74 include duplicate and qc vendor fail filters. Scores that are 423 have no filter. Scores 52 and 282 are based on mosdepth counting overlapping pairs as a single read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment