Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@ludofischer
Created April 25, 2012 15:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ludofischer/2490844 to your computer and use it in GitHub Desktop.
Save ludofischer/2490844 to your computer and use it in GitHub Desktop.
Extract ISO 3166 country codes from Wikipedia page using Ruby and Nokogiri
# find all rows
# for each table row
# get first td a
# get second td tt
# get third td tt
# get fourth td tt
require 'nokogiri'
countries = []
File.open('ISO_3166-1.htm') do |f|
doc = Nokogiri::HTML(f)
rows = doc.xpath('//tr')
for row in rows
name_td = row.xpath('td').first
unless name_td.nil?
country_info = []
country_info << name_td.xpath('.//a').text()
iso2_td = name_td.next_element()
country_info << iso2_td.xpath('a/tt').text()
iso3_td = iso2_td.next_element()
country_info << iso3_td.xpath('tt').text()
isonumeric_td = iso3_td.next_element()
country_info << isonumeric_td.xpath('tt').text()
countries << country_info
end
end
end
puts countries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment