Skip to content

Instantly share code, notes, and snippets.

@arempe93
Last active August 29, 2015 14:16
Show Gist options
  • Save arempe93/0c298b735e2dcb1f0c4d to your computer and use it in GitHub Desktop.
Save arempe93/0c298b735e2dcb1f0c4d to your computer and use it in GitHub Desktop.
A simple book information scraper by ISBN....with images!
require 'open-uri'
require 'nokogiri'
class BookScraper
def self.find_book(isbn)
# Query ISBNsearch and get resulting HTML page with book details
book_page = Nokogiri::HTML(open("http://isbnsearch.org/isbn/#{isbn}"))
book_info = Hash.new
# Parse HTML with Nokogiri and store attributes in hash
book_info[:title] = book_page.css('div.bookinfo h2').first.content
book_info[:image] = book_page.css('div.thumbnail img').first['src']
book_page.css('div.bookinfo p').each do |attrib|
attributes = attrib.content.split(': ')
key = attributes[0].downcase
key = key.gsub /\s/, '_' # Replace spaces and dashes with underscores
key = key.gsub /-/, '_'
key = key.gsub /list_/, '' # Remove list_ for price attribute
book_info[key.to_sym] = attributes[1].gsub /\$/, ''
end
# Return attribute hash
book_info
end
end
>> BookScraper.find_book 9780136039884
=> {:title=>"Study Guide for The Economic Way of Thinking", :image=>"http://ecx.images-amazon.com/images/I/51yoKYI9r3L._SL194_.jpg", :isbn_13=>"9780136039884", :isbn_10=>"013603988X", :authors=>"Paul Heyne; Peter J. Boettke; David L. Prychitko", :edition=>"12", :binding=>"Paperback", :publisher=>"Prentice Hall", :published=>"May 2009", :price=>"46.67"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment