Skip to content

Instantly share code, notes, and snippets.

@cogwirrel
Created June 2, 2016 15:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cogwirrel/b27752a704ac5426ddf464f30acf269a to your computer and use it in GitHub Desktop.
Save cogwirrel/b27752a704ac5426ddf464f30acf269a to your computer and use it in GitHub Desktop.
A ruby script that uses nokogiri to scrape wikipedia for all the airport IATA codes, with airport names and locations too
require 'nokogiri'
require 'open-uri'
# Get all the airports from wikipedia!
def get_airport_info_from_wikipedia()
airport_info = []
"ABCDEFGHIJKLMNOPQRSTUVWXYZ".split("").each do |letter|
doc = Nokogiri::HTML(open("https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_#{letter}"))
iata_table = doc.xpath("//table")[0]
rows = iata_table.xpath("//tr")
rows.each do |row|
if not row.at_xpath('td[1]/text()').to_s.strip.empty?
airport = {}
[
[:iata, 'td[1]/text()'],
[:icao, 'td[2]/text()'],
[:name, 'td[3]/a/text()'],
].each do |name, xpath|
airport[name] = row.at_xpath(xpath).to_s.strip
end
airport[:location] = (row.xpath('td[4]/a/text()').zip(row.xpath('td[4]/text()')).flatten.compact.map { |e| e.to_s.strip }).join(' ')
airport_info.push airport
end
end
end
return airport_info
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment