Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
A Ruby web scraping script that visits a GitHub trending repos page, scrapes the data for the 25 repos, loads them into a CSV, and then reads from the CSV & creates a hash with each repo's data.
require 'mechanize'
require 'csv'
# Load up the trending Ruby repos on GitHub from the last month.
url_to_scrape = ""
# Snag the website with Mechanize & parse it into an XML document we can query.
page =
# Set the name of the CSV we'll create & load from.
file = "repo_data.csv"
# Array for the repo data we'll generate.
repos = []
# Create a CSV file & et it know to write a header row.,'w',
write_headers: true,
headers: ["rank", "author", "repo_name", "description"]) do |csv|
puts "Opened #{file} for writing...\n\n"
# Parse out each of the 25 small <div> elements that contain the repo's data."ol li.repo-leaderboard-list-item").each do |repo, i|
# Create a temporary variable for each piece of repo data we want.
rank ="a.leaderboard-list-rank").inner_text.to_s.to_i
author ="span.owner-name").inner_text
repo_name ="a.repository-name strong").inner_text
description ="p.repo-leaderboard-description").inner_text
# Build an array of the repo data for the current row.
repo_data = [rank, author, repo_name, description]
# Push the row onto the CSV
csv << repo_data
puts "Done writing to #{file} file...\n\n"
# Load a CSV file & loop through each row. Let it know there is a header row & each key is split by commas. That gives you convenience methods that make the CSV data almost behave like a Hash.
CSV.foreach(file, headers: true, col_sep: ',') do |row|
# I'm just creating a hash with each repo's information then pushing it to the repos array. You could do whatever you wanted to here: write data to a database, fire off API calls from the data bits, just output to the log, whatever!
repo =
repo[:rank] = row['rank']
repo[:author] = row['author']
repo[:repo_name] = row['repo_name']
repo[:description] = row['description']
# Print the hash just for shits & giggles.
puts "#{repo}\n\n"
# Push the newly-created hash into the array of repos.
repos << repo
# Success messages rock!
puts "Done!\n\n"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment