Skip to content

Instantly share code, notes, and snippets.

@jnsprnw
Last active September 15, 2015 10:37
Show Gist options
  • Save jnsprnw/d4e85e42eb7edbdfa52a to your computer and use it in GitHub Desktop.
Save jnsprnw/d4e85e42eb7edbdfa52a to your computer and use it in GitHub Desktop.
Ruby Scraper with Nokogiri
#!/usr/bin/env ruby
Dir.chdir(File.dirname(__FILE__))
require 'nokogiri'
require 'open-uri'
require 'csv'
def spielerWerte (url)
puts url
doc = Nokogiri::HTML(open(URI.parse(URI.escape(url.to_s))))
affiliation = doc.xpath('//span[@itemprop="affiliation"]/a')
return {
"name" => doc.xpath('//div[@itemprop="name"]/text()').to_s.strip,
"birthDate" => doc.xpath('//span[@itemprop="birthDate"]/text()').to_s.strip,
"birthPlace" => doc.xpath('//span[@itemprop="birthPlace"]/text()').to_s.strip,
"affiliationTitle" => affiliation.xpath('@title').to_s.strip,
"affiliationLink" => affiliation.xpath('@href').to_s.strip,
"birthCountry" => doc.xpath('//table[@class="profilheader"][1]/tr[1]/td/span[3]/img/@title').to_s.strip
}
end
CSV.open("spieler.csv", "w") do |csv|
CSV.foreach(File.path("Kader-WMLink.csv"), :headers => true) do |el|
wert = spielerWerte(el["link"])
csv << wert.keys.push('unique') if ($.).eql? 2
csv << wert.values.push(el["unique"])
sleep(2)
end
end
@jnsprnw
Copy link
Author

jnsprnw commented Sep 15, 2015

Kader-WMLink.csv

link,unique
http://www.transfermarkt.de/jefferson/profil/spieler/32561,jefferson_32561
http://www.transfermarkt.de/julio-cesar/profil/spieler/22412,julio-cesar_22412

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment