Skip to content

Instantly share code, notes, and snippets.

@duanebester
Last active November 16, 2021 23:32
Show Gist options
  • Save duanebester/29c7d3645e5d319ee9cfea870e05d084 to your computer and use it in GitHub Desktop.
Save duanebester/29c7d3645e5d319ee9cfea870e05d084 to your computer and use it in GitHub Desktop.
Web Scraping Driver example
(require '[etaoin.api :as d])
;; Start firefox driver
(def driver (d/firefox {:headless false}))
;; Navigate to product url (loc key)
(d/go driver (:loc (cache->product "bellroy")))
;; Query <script type="application/ld+json"> tags
(def ld-json-query {:tag :script :type "application/ld+json"})
(def ld-json-ids (d/query-all driver ld-json-query))
;; => ["c84039d5-8813-4c40-8d05-6a0e49938883" "15b1c505-a420-ed4f-96a9-e8758067535f"]
(require '[clojure.data.json :refer [read-str]])
;; Only two ld+json script tags-- get the content
(read-str (d/get-element-inner-html-el driver (first ld-json-ids)))
;; => {"@context" "https://schema.org", "@type" "BreadcrumbList" ...}
(read-str (d/get-element-inner-html-el driver (last ld-json-ids)))
;; => {"@context" "https://schema.org/", "@type" "product" ...}
(require '[clojure.string :refer [escape]])
;; Remove '@' symbols and Build map
(def product-json
(read-str
(escape
(d/get-element-inner-html-el driver (last ld-json-ids))
{\@ ""})
:key-fn keyword))
(println product-json)
;; => {:context "https://schema.org/", :type "product", :brand "Bellroy" ...}
(d/quit driver)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment