Skip to content

Instantly share code, notes, and snippets.

@michaelt
Created June 15, 2014 15:43
Show Gist options
  • Save michaelt/88e1fac12876857deefe to your computer and use it in GitHub Desktop.
Save michaelt/88e1fac12876857deefe to your computer and use it in GitHub Desktop.
module Main where
import Prelude hiding (lines)
import Lens.Family
import Pipes
import Pipes.Group
import Pipes.HTTP
import Pipes.Text
import Pipes.Text.Encoding
import Pipes.Text.IO (toHandle,stdout)
import qualified System.IO as IO
import Data.Functor (void)
main = do
req <- parseUrl "http://www.example.com"
-- "http://www.gutenberg.org/files/10/10-h/10-h.htm"
withManager tlsManagerSettings $ \m ->
withHTTP req m $ \resp -> void $ runEffect $
numberLines (responseBody resp ^. utf8 . lines) >-> toHandle IO.stdout
numberLines :: Monad m => FreeT (Producer Text m) m bad -> Producer Text m bad
numberLines = number_loop (1 :: Int) where
number_loop n freeProducers = do
freeProducer <- lift $ runFreeT freeProducers
case freeProducer of
Pure badbytes -> do pack' "\n"
return badbytes -- these could be inspect with e.g.
Free p -> do pack' ("\n" ++ show n ++ " ")
nextFreeProducers <- p
number_loop (n+1) nextFreeProducers
pack' str = yield str >-> pack
-- Pipes.Text.pack should probably be String -> Producer Text m ()
@michaelt
Copy link
Author

The one defect in the program was the use of free2list / concats, and then numbering by piping to printAllResults. What you end up numbering then are the raw text chunks. Thus when I run your program I see

 46 "    domain in examples without prior coordination or "
 47 "asking for permission.</p>"

but with the one above

  46      domain in examples without prior coordination or asking for permission.</p>
  47      <p><a href="http://www.iana.org/domains/example">More information...</a>

(The difference is more obvious in the number count if you ask for the KJV as in the commented url.)

This is because there is a break in the bytes delivered by the surrounding http machinery; it is between "or " and "asking". So you end up giving a new number beginning with "asking".


Edit: the new version you have put up evades this with the short www.examples.com page and my setup; but with the Project Gutenberg text the Apocalypse reads like so:

  99170 "<p>4:3 And he that sat was to look upon like a jasper and a sardine\r"
  99171 "stone: and there was a"
  99172 " rainbow round about the throne, in sight like\r"
  99173 "unto an emerald.</p>\r"

because there is a byte break between "a" and " rainbow"


An immensely long line would be broken into several ByteStrings, which decodeUtf8 b or b ^. utf8 would translate into several Texts . The program would number lines perfectly if responseBody resp happened to deliver one ByteString for the whole file.

Since you are pattern matching on the FreeT / FreeF constructors, I do the line-numbering directly this way. When I scrutinize the FreeT and come upon a Free constructor, i.e. a new line of text (which may be produced in several chunks), I prefix the number and loop with the next number.

I then feed everything to the Pipes.Text.IO operations, for no reason, but note that I use toHandle which (mirroring pipes-bytestring) allows me to keep any return value -- in this case a possible producer of bad bytes. If I had used stdout, my Text producer would need to return (), which as you saw is part of what tripped up the OP. I would have to get rid of any possible bad bytes first, so the position of void would be inside the scope of runEffect:

 runEffect $ void (numberLines (responseBody resp ^. utf8 . lines)) >-> stdout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment