Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
;; in reply to http://www.sids.in/blog/2010/05/06/html-parsing-in-clojure-using-htmlcleaner/
(ns html-parser
(:require [net.cgrand.enlive-html :as e]))
(defn parse-page
"Given the HTML source of a web page, parses it and returns the :title
and the tag-stripped :content of the page. Does not do any encoding
detection, it is expected that this has already been done."
[page-src]
(-> page-src java.io.StringReader. e/html-resource
(e/at [#{:script :style}] nil)
(e/let-select [[title] [:title], [body] [:body]]
{:title (e/text title), :content (e/text body)})))
@sids

This comment has been minimized.

Copy link

sids commented May 7, 2010

Thanks a ton for posting this, this is so much nicer than using HtmlCleaner. I've put off going through the Enlive tutorial for far too long now; after seeing this little snippet, I'm unwilling to put it off any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.