Skip to content

Instantly share code, notes, and snippets.

@IanVaughan
Created May 13, 2023 22:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save IanVaughan/4a26b2ea919ae748595b0d89cb5e9977 to your computer and use it in GitHub Desktop.
Save IanVaughan/4a26b2ea919ae748595b0d89cb5e9977 to your computer and use it in GitHub Desktop.
Scrape discogs.com price data
require 'csv'
require 'nokogiri'
require 'open-uri'
require 'pry'
urls = File.read('urls.txt').split("\n")
CSV.open('scraped_data.csv', 'wb') do |csv|
urls.each do |url|
html = URI.open(url).read
doc = Nokogiri::HTML(html)
puts(url)
median = doc.xpath("//*[contains(text(), 'Median')]").first.parent.children.last.children.last.to_s
for_sale = doc.xpath("//*[contains(text(), 'For Sale')]")[1].parent.children.last.children.last.to_s
csv << [url, median, for_sale]
rescue StandardError
csv << [url]
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment