Skip to content

Instantly share code, notes, and snippets.

@mark-cooper
Created May 29, 2013 18:27
Show Gist options
  • Save mark-cooper/5672544 to your computer and use it in GitHub Desktop.
Save mark-cooper/5672544 to your computer and use it in GitHub Desktop.
Download LOC country codes
require 'mechanize'
require 'nokogiri'
require 'csv'
agent = Mechanize.new
# download country / language codes from LOC
urls = [
{
url: "http://www.loc.gov/marc/countries/countries_code.html",
file: 'codes.txt',
codes: [],
},
{
url: "http://www.loc.gov/marc/languages/language_code.html",
file: 'langs.txt',
codes: [],
},
]
delimiter = "\t"
urls.each do |codes|
page = agent.get(codes[:url])
doc = Nokogiri::HTML(page.body)
table = doc.css('table').first
table.css('tr').each do |row|
cc = []
row.css('td').each do |data|
cc << data.text
end
codes[:codes] << cc unless cc.empty?
end
CSV.open(codes[:file], "w", { col_sep: delimiter }) do |csv|
codes[:codes].each do |d|
csv << d
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment