Skip to content

Instantly share code, notes, and snippets.

@23Skidoo
Created September 3, 2013 22:15
Show Gist options
  • Save 23Skidoo/6430317 to your computer and use it in GitHub Desktop.
Save 23Skidoo/6430317 to your computer and use it in GitHub Desktop.
Incremental text processing in Haskell with lazy I/O.
-- See http://stackoverflow.com/questions/18601033/haskell-avoiding-stack-overflow-in-folds-without-sacrificing-performance/18602250.
{-# LANGUAGE OverloadedStrings, BangPatterns #-}
module Main where
import qualified Data.ByteString.Lazy.Char8 as L
import Data.Int (Int64)
genTweets :: L.ByteString -> L.ByteString
genTweets text | L.null text = ""
| otherwise = L.intercalate "\n\n" $ toTweets $ L.words text
where
-- Concatenate words into 139-character tweets.
toTweets :: [L.ByteString] -> [L.ByteString]
toTweets [] = []
toTweets [w] = [w]
toTweets (w:ws) = go (L.length w, w) ws
-- Main loop. Notice how the output tweet (cur_str) is generated as soon as
-- possible, thus enabling L.writeFile to consume it before the whole
-- input is processed.
go :: (Int64, L.ByteString) -> [L.ByteString] -> [L.ByteString]
go (_cur_len, !cur_str) [] = [cur_str]
go (!cur_len, !cur_str) (w:ws)
| lw + cur_len <= 139 = go (cur_len + lw + 1,
cur_str `L.append` " " `L.append` w) ws
| otherwise = cur_str : go (lw, w) ws
where
lw = L.length w
-- Notice the use of lazy I/O.
main :: IO ()
main = do dict <- L.readFile "/usr/share/dict/words"
L.writeFile "tweets" (genTweets dict)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment