Skip to content

Instantly share code, notes, and snippets.

@defndaines
Created February 27, 2018 20:53
Show Gist options
  • Save defndaines/8e4124fa2d6c530d477e06f91223188d to your computer and use it in GitHub Desktop.
Save defndaines/8e4124fa2d6c530d477e06f91223188d to your computer and use it in GitHub Desktop.
Example using Jsoup in Clojure to parse a webpage
{:deps
{org.jsoup/jsoup {:mvn/version "1.11.2"}}}
(import '[org.jsoup Jsoup])
(def url "https://en.wikipedia.org/wiki/World_Fantasy_Award%E2%80%94Novel")
(def html (.get (Jsoup/connect url)))
(def winners (.select html "tr[style=background:#B0C4DE;]"))
;; Jsoup Selectors: https://jsoup.org/cookbook/extracting-data/selector-syntax
(defn pull-details
[elem]
(letfn [(text-from [selector] (.text (.select elem selector)))]
{:year (text-from "th a") ; misses years with a tie
:author (text-from "span.fn a")
:title (text-from "td i")}))
(def awards
(reduce
(fn [acc e]
(let [book (pull-details e)]
(if (empty? (:year book))
(let [year (:year (first acc))]
(conj acc (assoc book :year year)))
(conj acc book))))
'()
winners))
;; ({:year "2017" :author "Claire North" :title "The Sudden Appearance of Hope"}]
;; {:year "2016" :author "Anna Smaill" :title "The Chimes"}
;; ,,,)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment