Skip to content

Instantly share code, notes, and snippets.

@idoan
Created January 13, 2017 00:59
Show Gist options
  • Save idoan/de3897226fa956da421046cd41e614c5 to your computer and use it in GitHub Desktop.
Save idoan/de3897226fa956da421046cd41e614c5 to your computer and use it in GitHub Desktop.
Computing probabilities of patterns in a string
# method to calculate a k-mer's minimum number of appearance in a dna sequence
# from Bioinformatics Specialization on Coursera.
import operator as op
def ncr(n, r):
r = min(r, n-r)
if r == 0: return 1
numer = reduce(op.mul, xrange(n, n-r, -1))
denom = reduce(op.mul, xrange(1, r+1))
return numer//denom
# N, length of the string
# A, number of letters in the alphabet, 4 for DNA: A,T,G,C
# k, k-mer length
# t, number of minimum sequence repetation
def Pr(N,A,k,t):
return ncr(N-t*(k-1),t)*1.0/pow(A,(t-1)*k)
print(Pr(500,4,9,3)) # outputs 0.000259924854618 , not ~ 0.00075 as described in the course
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment