Skip to content

Instantly share code, notes, and snippets.

@yobibyte
Created December 3, 2018 23:37
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save yobibyte/dffab6b16c987424dbc67a1572f061ea to your computer and use it in GitHub Desktop.
Save yobibyte/dffab6b16c987424dbc67a1572f061ea to your computer and use it in GitHub Desktop.
Script to parse NeurIPS proceedings page
import csv
import urllib
from bs4 import BeautifulSoup
url_to_parse = 'https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018'
page = urllib.request.urlopen(url_to_parse)
soup = BeautifulSoup(page, 'html.parser')
with open('proceedings.csv', 'w') as csvfile:
wr = csv.writer(csvfile, delimiter=',')
for p in soup.find('div', attrs={'class':'main-container'}).find('ul').find_all('li'):
c = p.find_all('a')
try:
text = c[0].text
except:
text = ''
try:
link = 'https://papers.nips.cc' + c[0].attrs['href']
except:
link = ''
try:
author = c[1].text
except:
author = ''
wr.writerow([text, author, link])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment