Skip to content

Instantly share code, notes, and snippets.

@schmee
Last active August 29, 2015 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save schmee/1eff71c6b24ece20b103 to your computer and use it in GitHub Desktop.
Save schmee/1eff71c6b24ece20b103 to your computer and use it in GitHub Desktop.
(ns scrape
(:require [net.cgrand.enlive-html :as html]
[clojure.pprint :refer [pprint]]))
(def ^:dynamic *base-url* "http://www.last.fm")
(defn fetch-url [url]
(html/html-resource (java.net.URL. url)))
(defn get-links-to-similar [artist-page]
(for [artist (html/select artist-page [:a.similar-artist])]
(-> artist :attrs :href (#(str *base-url* %)))))
(defn bfs-inner [artist-url visited depth max-depth]
(println artist-url)
(let [similar (get-links-to-similar (fetch-url artist-url))]
(when (< depth max-depth)
(map #(bfs-inner % (conj visited artist-url) (inc depth) max-depth) similar))))
(defn bfs [artist-url max-depth]
(bfs-inner artist-url #{} 0 max-depth))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment