Skip to content

Instantly share code, notes, and snippets.

@aaronschiff
Created August 2, 2015 21:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aaronschiff/f66bc315e9549ed16e5a to your computer and use it in GitHub Desktop.
Save aaronschiff/f66bc315e9549ed16e5a to your computer and use it in GitHub Desktop.
parse airports list from Wikipedia and save as CSV, based on https://gist.github.com/fogonwater/e7039f8e34e3c8c7487b
from bs4 import BeautifulSoup
import urllib2
import string
import csv
report = []
baseUrl = "https://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_"
uppercaseLetters = list(string.ascii_uppercase)
for letter in uppercaseLetters:
url = baseUrl + letter
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), 'html.parser')
# assumes one matching table on page with class wikitable
table = soup.find('table', {'class' : 'wikitable'})
for row in table.findAll('tr'):
tds = row.find_all('td')
items = [td.text.strip().encode('utf8') for td in tds]
report.append(items)
header = ['IATA', 'ICAO', 'Name', 'Location', 'Time', 'DST']
with open('airports.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerow(header)
writer.writerows(row for row in report if row)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment