Skip to content

Instantly share code, notes, and snippets.

@leslie-alldridge
Last active February 6, 2020 02:27
Show Gist options
  • Save leslie-alldridge/42caef2514815b665b1ec6d981715c99 to your computer and use it in GitHub Desktop.
Save leslie-alldridge/42caef2514815b665b1ec6d981715c99 to your computer and use it in GitHub Desktop.
import requests
import csv
from bs4 import BeautifulSoup
f = csv.writer(open('z-artist-names.csv', 'w'))
f.writerow(['Name', 'Link'])
url = 'https://web.archive.org/web/20121007172955/https://www.nga.gov/collection/anZ1.htm'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
last_links = soup.find(class_='AlphaNav')
last_links.decompose()
artist_name_list = soup.find(class_='BodyText')
artist_name_list_items = artist_name_list.find_all('a')
for artist_name in artist_name_list_items:
names = artist_name.contents[0].encode("utf-8")
links = 'https://web.archive.org' + artist_name.get('href')
print(names, links)
f.writerow([names, links])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment