Skip to content

Instantly share code, notes, and snippets.

@hyone
Created January 16, 2012 14:39
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hyone/1621163 to your computer and use it in GitHub Desktop.
Save hyone/1621163 to your computer and use it in GitHub Desktop.
Parallel download images from a picture site
;; (defproject parallel-download "1.0.0-SNAPSHOT"
;; :description "FIXME: write description"
;; :dependencies [[org.clojure/clojure "1.3.0"]
;; [clj-http "0.2.7"]
;; [enlive "1.0.0-SNAPSHOT"]]
;; :dev-dependencies [[swank-clojure "1.3.4"]]
;; :main parallel-download.core)
(ns parallel-download.core
(:use [clojure.string :only (join split)]
[clojure.java.io :only (make-output-stream)])
(:require [clj-http.client :as client]
[net.cgrand.enlive-html :as enlive])
(:gen-class))
(def root "http://satlog.blog119.fc2.com/blog-entry-2943.html")
(defn filename [url]
(last (split url #"/")))
(defn image-urls [content]
(map #(get-in % [:attrs :src])
(enlive/select (enlive/html-snippet content)
[[:img (enlive/attr-ends :src ".jpg")]])))
(defn save-image [url]
(println (format "downloading %s ..." url))
(let [res (client/get url {:as :byte-array})
data (:body res)
name (filename url)]
(with-open [w (make-output-stream (join "/" ["data" name]) {})]
(.write w data))))
(defn sequential-donwloads [urls]
(map save-image urls))
(defn parallel-downloads [urls]
(let [threads (doall (map #(future
;; (print (format "%d: " (.getId (Thread/currentThread))))
(save-image %)) urls))]
(doall (map deref threads))))
(defn -main [& args]
(let [response (client/get root)
urls (image-urls (:body response))]
;; (sequential-donwloads urls)
(parallel-downloads urls)
(println "Finish.")
(System/exit 0)))
@firesofmay
Copy link

Thanks this was really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment