Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Scrape a selection of links on a page
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'csv'
PAGE_URL = "https://www.wikipedia.org/"
LINK_NODE = "a"
CSV_FILE = "scraped-data.csv"
puts "Please wait. Scraping..."
page = Nokogiri::HTML(open(PAGE_URL))
links = page.css(LINK_NODE)
CSV.open(CSV_FILE, "w") do |csv|
links.each_with_index do |link, index|
csv << [link.text.strip!, link['href']]
end
end
puts "Done! #{links.length} links scraped."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment