Skip to content

Instantly share code, notes, and snippets.

@X4
Last active August 29, 2015 14:23
Show Gist options
  • Save X4/da05fb9173df5c57ebfb to your computer and use it in GitHub Desktop.
Save X4/da05fb9173df5c57ebfb to your computer and use it in GitHub Desktop.
k-mer algorithm
{-# LANGUAGE UnicodeSyntax #-}
{-
Reads: AGATCGAGTG
Prints: sorted 3-mers: AGA AGT ATC CGA GAG GAT GTG TCG
unsorted 3-mers: AGA GAT ATC TCG CGA GAG AGT GTG
-}
import Data.Function
import Data.List
import Data.Ord ()
kMers ∷ Int → [a] → [[a]]
kMers k seqs = map (take k) $ take (n-k+1) $ tails seqs where n = length seqs
getFreq ∷ Ord a ⇒ [a] → [(a, Int)]
getFreq = sortBy (flip compare `on` snd) . map ((,) <$> head <*> length) . group
topKMers ∷ Ord a ⇒ Int → [a] → Int → [([a], Int)]
topKMers n xs k = take k $ getFreq $ kMers n xs
testSeq ∷ String
testSeq = "AGATCGAGTG"
--main ∷ IO ()
main = do
let kMerSeq = kMers 3 testSeq
let kMerTop = topKMers 3 testSeq 8
putStrLn ("k-mers: " ++ unwords kMerSeq)
print kMerTop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment