Skip to content

Instantly share code, notes, and snippets.

@cgrand
Created May 7, 2010 08:36
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cgrand/393194 to your computer and use it in GitHub Desktop.
Save cgrand/393194 to your computer and use it in GitHub Desktop.
;; in reply to http://www.sids.in/blog/2010/05/06/html-parsing-in-clojure-using-htmlcleaner/
(ns html-parser
(:require [net.cgrand.enlive-html :as e]))
(defn parse-page
"Given the HTML source of a web page, parses it and returns the :title
and the tag-stripped :content of the page. Does not do any encoding
detection, it is expected that this has already been done."
[page-src]
(-> page-src java.io.StringReader. e/html-resource
(e/at [#{:script :style}] nil)
(e/let-select [[title] [:title], [body] [:body]]
{:title (e/text title), :content (e/text body)})))
@sids
Copy link

sids commented May 7, 2010

Thanks a ton for posting this, this is so much nicer than using HtmlCleaner. I've put off going through the Enlive tutorial for far too long now; after seeing this little snippet, I'm unwilling to put it off any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment