Skip to content

Instantly share code, notes, and snippets.

@SavinaRoja
Created December 7, 2015 20:21
Show Gist options
  • Save SavinaRoja/72b86bd037ef122941ef to your computer and use it in GitHub Desktop.
Save SavinaRoja/72b86bd037ef122941ef to your computer and use it in GitHub Desktop.
Using Lazy ByteStrings and Data.Map to quickly count unique byte occurrences in input of arbitrary length
-- A solution to Rosalind Problem: DNA by SavinaRoja
-- http://rosalind.info/problems/dna/
-- Uses Data.ByteString to remove unnecessary overhead of string or Text
-- representation, and Data.ByteString.Lazy in particular for safety in
-- handling sequence files of arbitrary length.
--
-- The countMap function expresses the solution to the general problem
-- of efficiently counting all unique bytes encountered in a bytestring.
-- This program should perform linearly with respect to sequence length.
-- Test files may be generated by modifying the following command:
-- dd if=/dev/urandom of=sequence bs=1M count=<N>
module Main where
import Data.List
import Data.Word
import qualified Data.ByteString.Lazy as BL
import qualified Data.Map.Strict as M
import System.Environment
import System.IO
countMap :: Num a => BL.ByteString -> M.Map Word8 a
countMap bs = M.fromListWith (+) [(x, 1) | x <- BL.unpack bs]
main :: IO ()
main = do
args <- getArgs
contents <- BL.readFile (head args)
let counts = countMap contents
let lookupHelper k = show (M.findWithDefault 0 k counts)
-- This prints out the results in problem answer format: #A, #C, #G, #T
-- In ASCII: A = 65, C = 67, G = 71, T = 84
putStr $ intercalate " " $ map lookupHelper [65, 67, 71, 84]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment