;; in reply to
(ns html-parser
(:require [net.cgrand.enlive-html :as e]))
(defn parse-page
"Given the HTML source of a web page, parses it and returns the :title
and the tag-stripped :content of the page. Does not do any encoding
detection, it is expected that this has already been done."
(-> page-src e/html-resource
(e/at [#{:script :style}] nil)
(e/let-select [[title] [:title], [body] [:body]]
{:title (e/text title), :content (e/text body)})))

sids commented May 7, 2010

Thanks a ton for posting this, this is so much nicer than using HtmlCleaner. I've put off going through the Enlive tutorial for far too long now; after seeing this little snippet, I'm unwilling to put it off any more.

