Skip to content

Instantly share code, notes, and snippets.

@wdecoster
Created April 14, 2017 12:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wdecoster/252dd039efaef332405be4c08dd3680e to your computer and use it in GitHub Desktop.
Save wdecoster/252dd039efaef332405be4c08dd3680e to your computer and use it in GitHub Desktop.
import pysam
import re
def extractMDFromBam(bam):
'''
loop over a bam file and get the edit distance to the reference genome
mismatches are stored in the MD tag
scale by aligned read length
'''
samfile = pysam.AlignmentFile(bam, "rb")
return [sum([len(item) for item in re.split('[0-9^]', read.get_tag("MD"))]) / read.query_alignment_length for read in samfile.fetch()]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment