Skip to content

Instantly share code, notes, and snippets.

@willpearse
Last active August 29, 2015 14:20
Show Gist options
  • Save willpearse/e1d5a92c03388012eb77 to your computer and use it in GitHub Desktop.
Save willpearse/e1d5a92c03388012eb77 to your computer and use it in GitHub Desktop.
Scraping player minor penalties from NHL
#Headers
require 'open-uri'
require 'nokogiri'
require 'set'
#Get all the penalty stats
results = []
(2011..2014).to_a.each do |year|
(0..30).to_a.map{|x| x * 40 + 1}.each do |player_count|
page = Nokogiri::HTML(open("http://espn.go.com/nhl/statistics/player/_/stat/minor-penalties/sort/minorPenalties/year/#{year}/seasontype/2/count/#{player_count}"))
page.css("tr").each do |entry|
unless entry.text == "RKPLAYERTEAMPIMMINORHKTRRGHHLDINTSLHHSCCHLDSTGLINT"
temp = entry.children.map {|x| x.text}
temp.push(year)
results.push(temp)
end
end
end
end
#Get all the images
unique_players = Set.new(results.map{|x| x[1]})
unique_players.each do |player|
player = player[/[a-zA-Z ]*/]
player = player.sub(" ", "-")
File.open("#{player}.png", "w") do |handle|
handle << open("http://assets1.sportsnet.ca/wp-content/uploads/players/nhl/#{player[0]}/#{player}.png")
end
end
#Write out the player statistics
File.open("fred_hockey.txt", "w") do |handle|
results.each do |line|
handle << "#{line.join("\t")}\n"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment