Created
December 7, 2015 20:21
-
-
Save SavinaRoja/72b86bd037ef122941ef to your computer and use it in GitHub Desktop.
Using Lazy ByteStrings and Data.Map to quickly count unique byte occurrences in input of arbitrary length
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- A solution to Rosalind Problem: DNA by SavinaRoja | |
-- http://rosalind.info/problems/dna/ | |
-- Uses Data.ByteString to remove unnecessary overhead of string or Text | |
-- representation, and Data.ByteString.Lazy in particular for safety in | |
-- handling sequence files of arbitrary length. | |
-- | |
-- The countMap function expresses the solution to the general problem | |
-- of efficiently counting all unique bytes encountered in a bytestring. | |
-- This program should perform linearly with respect to sequence length. | |
-- Test files may be generated by modifying the following command: | |
-- dd if=/dev/urandom of=sequence bs=1M count=<N> | |
module Main where | |
import Data.List | |
import Data.Word | |
import qualified Data.ByteString.Lazy as BL | |
import qualified Data.Map.Strict as M | |
import System.Environment | |
import System.IO | |
countMap :: Num a => BL.ByteString -> M.Map Word8 a | |
countMap bs = M.fromListWith (+) [(x, 1) | x <- BL.unpack bs] | |
main :: IO () | |
main = do | |
args <- getArgs | |
contents <- BL.readFile (head args) | |
let counts = countMap contents | |
let lookupHelper k = show (M.findWithDefault 0 k counts) | |
-- This prints out the results in problem answer format: #A, #C, #G, #T | |
-- In ASCII: A = 65, C = 67, G = 71, T = 84 | |
putStr $ intercalate " " $ map lookupHelper [65, 67, 71, 84] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment