Skip to content

Instantly share code, notes, and snippets.

@dast1
Last active October 21, 2023 12:34
Show Gist options
  • Save dast1/69ba778fe284411a77609ab4abf948a0 to your computer and use it in GitHub Desktop.
Save dast1/69ba778fe284411a77609ab4abf948a0 to your computer and use it in GitHub Desktop.
Scrapes the Russell 3000 Membership List (official source in .pdf) and builds Symbol List
# Build Russell 3000 List
# Import libraries
import urllib.request
import datetime
# Download Russell 3000 to local repository
f_path = "/Russell3000/Membership Lists/"
f_name = f_path + "Russell3000 " + datetime.date.today().strftime("(%b %d, %Y)") + ".pdf"
def download_file(url):
urllib.request.urlretrieve(url, f_name)
download_file('http://www.ftserussell.com/files/support-documents/2017-ru3000-membership-list')
# Parse PDF table into a DataFrame
from tabula import read_pdf
PDF_table = read_pdf(f_name, pages ='all')
Russell3000 = PDF_table.iloc[:, [1, 3]].stack().drop_duplicates().sort_values().values.T.tolist()
if "Ticker" in Russell3000: Russell3000.remove("Ticker")
# Delete variables
del PDF_table, f_path, f_name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment