Skip to content

Instantly share code, notes, and snippets.

@wdecoster
Last active April 14, 2017 12:38
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wdecoster/886e9b013c08523cf45cea89b6855950 to your computer and use it in GitHub Desktop.
Save wdecoster/886e9b013c08523cf45cea89b6855950 to your computer and use it in GitHub Desktop.
import pysam
import re
def extractMDFromBam(bam):
'''
loop over a bam file and get the edit distance to the reference genome
mismatches are stored in the MD tag
scale by aligned read length
'''
samfile = pysam.AlignmentFile(bam, "rb")
return [(read.query_alignment_length - sum([int(item) for item in re.split('[ACTG^]', read.get_tag("MD")) if not item == '']))/read.query_alignment_length for read in samfile.fetch()]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment