Skip to content

Instantly share code, notes, and snippets.

@takaki
Created October 15, 2012 09:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save takaki/3891747 to your computer and use it in GitHub Desktop.
Save takaki/3891747 to your computer and use it in GitHub Desktop.
HXT, XPath
import Codec.Binary.UTF8.String
import Codec.Text.IConv
import Data.List
import Text.XML.HXT.Core
import qualified Data.ByteString.Lazy as BSL
import Text.XML.HXT.XPath.Arrows
-- 日本語
main = do
cs <- BSL.readFile "4731398C.html"
let u8s = convert "CP932" "UTF-8" cs
let html = decode (BSL.unpack u8s)
let doc = readString [withParseHTML yes, withWarnings no] html
nodes <- runX $ doc
>>>
getXPathTrees "//*[@id=\"main\"]/form/div[4]/table/tr/td[1]/table[1]/tr[1]/td/div/p"
//> getText
mapM_ ( putStrLn . id) nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment