Skip to content

Instantly share code, notes, and snippets.

@mwerner
Created December 5, 2013 23:51
Show Gist options
  • Save mwerner/7816285 to your computer and use it in GitHub Desktop.
Save mwerner/7816285 to your computer and use it in GitHub Desktop.
Scrape Unsplash
require 'open-uri'
require 'nokogiri'
require 'fileutils'
class Scraper
attr_accessor :url, :selector, :page
def initialize(url, selector)
@url = url
@selector = selector
@page = 1
end
def self.start
scraper = Scraper.new('http://unsplash.com', '.photo_img')
20.times do |page|
scraper.page = (page + 1)
scraper.images.each{|img| scraper.download(img.attr('src')) }
end
end
def images
document(page).css(selector)
end
def download(url)
FileUtils.mkdir_p(File.join(Dir.pwd, 'photos'))
path = File.join(Dir.pwd, 'photos', url.split('/').last)
return if File.exist?(path)
open(url) do |response|
File.open(path, "wb") do |file|
file.puts response.read
end
end
end
def document(page)
raise "Invalid page" if page <=0
Nokogiri::HTML(open("#{url}/page/#{page}"))
end
end
Scraper.start
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment