Skip to content

Instantly share code, notes, and snippets.

@jmn
Created April 27, 2018 14:14
Show Gist options
  • Save jmn/723c9b89760998f75bc188dbe728d80c to your computer and use it in GitHub Desktop.
Save jmn/723c9b89760998f75bc188dbe728d80c to your computer and use it in GitHub Desktop.
Absolute urls in haskell with hxt (2nd version)
import Text.XML.HXT.Core
import Data.Maybe
toAbsoluteUrl
:: ArrowXml a
=> String -> a XmlTree XmlTree
toAbsoluteUrl base =
processAttrl (mkAbsolute mkAbsoluteUrl `when` (hasName "href" <+> hasName "src"))
where
mkAbsolute f = replaceChildren (xshow getChildren >>> arr (f base) `orElse` this >>> mkText)
mkAbsoluteUrl :: String -> String -> String
mkAbsoluteUrl base url = fromMaybe url (expandURIString url base)
sanitize :: String -> String -> String
sanitize baseUrl = transformTree arrows
where
fixUrls = toAbsoluteUrl baseUrl `when` (hasName "a" <+> hasName "img")
arrows = processTopDown fixUrls
transformTree :: LA XmlTree XmlTree -> String -> String
transformTree arrows =
concat . runLA (hread >>> selem "html" [this] >>> arrows >>> writeDocumentToString cfg)
where
cfg = [withOutputEncoding utf8, withOutputHTML, withRemoveWS yes]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment