Skip to content

Instantly share code, notes, and snippets.

@jefdaj
Last active March 29, 2016 00:03
Show Gist options
  • Save jefdaj/c116fd5f4d79ed5439a1 to your computer and use it in GitHub Desktop.
Save jefdaj/c116fd5f4d79ed5439a1 to your computer and use it in GitHub Desktop.
Scalpel example
import Text.HTML.Scalpel
import Control.Applicative ((<|>))
type Author = String
data Comment = TextComment Author String | ImageComment Author URL
deriving Show
-- allComments :: IO (Maybe [Comment])
-- allComments = scrapeURL "http://example.com/article.html" comments
comments :: Scraper String [Comment]
comments = chroots ("div" @: [hasClass "container"]) comment
comment :: Scraper String Comment
comment = textComment <|> imageComment
textComment :: Scraper String Comment
textComment = do
author <- text $ "span" @: [hasClass "author"]
commentText <- text $ "div" @: [hasClass "text"]
return $ TextComment author commentText
imageComment :: Scraper String Comment
imageComment = do
author <- text $ "span" @: [hasClass "author"]
imageURL <- attr "src" $ "img" @: [hasClass "image"]
return $ ImageComment author imageURL
main :: IO ()
main = do
html <- readFile "article.html"
let (Just cs) = scrapeStringLike html comments
mapM_ (putStrLn . show) cs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment