Skip to content

Instantly share code, notes, and snippets.

@shayanjm
Created June 28, 2014 21:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shayanjm/e5ebf21a4d57c84a4718 to your computer and use it in GitHub Desktop.
Save shayanjm/e5ebf21a4d57c84a4718 to your computer and use it in GitHub Desktop.
(defn get-html-resource
"Grabs the page at the given URL and turns it into an enlive HTML resource for consumption later"
[url]
(with-open [client (http/create-client :user-agent "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" :follow-redirects true)]
(let [response (http/GET client url)]
(-> response
http/await
http/string
StringReader.
html/html-resource))))
(defn get-nyt-article-contents
"Gets the New York Times article contents at the given URL"
[url]
(-> url
get-html-resource
(html/select [:p.story-body-text])
(->>
(map :content)
(map first)
(s/join " "))))
(get-nyt-article-contents "http://www.nytimes.com/2014/06/29/sports/basketball/wizards-guarantee-a-contract.html")
=> "Andre Miller’s agent, Andy Miller, said he had been officially notified by the "
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment