Skip to content

Instantly share code, notes, and snippets.

@seagreen
Last active June 13, 2016 13:13
Show Gist options
  • Save seagreen/6fae524d84932c3d4cb826577144573f to your computer and use it in GitHub Desktop.
Save seagreen/6fae524d84932c3d4cb826577144573f to your computer and use it in GitHub Desktop.
HTML Parsing in Haskell

Actively Maintained

tagsoup

Based on lazy Strings.

Does its own parsing and has its own custom XML data type.

html-conduit

A wrapper around tagstream-conduit by @yihuang.

tagstream-conduit is based on plain conduit.

Somewhat Maintained

HandsomeSoup

Not an option for real-world HTML parsing since it ignores malformed HTML (according to here).

Based on HXT.

The README mentions it doesn't work on GHC 7.6 yet

taggy

Works on GHC 7.6 at least, but last update a year ago: https://github.com/alpmestan/taggy

Not Actively Maintained

Holumbus-Searchengine

Does a lot of other stuff besides parsing.

Uses HXT, so probably only works with properly formatted XML.

shpider

Based on tagsoup-parsec, which isn't maintained either: https://hackage.haskell.org/package/tagsoup-parsec

dom-selector

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment