Skip to content

Instantly share code, notes, and snippets.

@umitanuki
Created August 22, 2018 20:46
Show Gist options
  • Save umitanuki/e4126f8ec53cc115c42d978d315dfdb8 to your computer and use it in GitHub Desktop.
Save umitanuki/e4126f8ec53cc115c42d978d315dfdb8 to your computer and use it in GitHub Desktop.
Selenium script to get ETF consittuents
import pandas as pd
'''
brew install phantomjs
or
brew tap homebrew/cask
brew cask install chromedriver
'''
# from selenium import webdriver
from selenium.webdriver.chrome.webdriver import WebDriver
def get_etf_holdings(etf_symbol):
'''
etf_symbol: str
return: pd.DataFrame
'''
url = 'https://www.barchart.com/stocks/quotes/{}/constituents?page=all'.format(
etf_symbol)
# Loads the ETF constituents page and reads the holdings table
browser = WebDriver() # webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'html')
table = get_table(soup)
# Reads the holdings table line by line and appends each asset to a
# dictionary along with the holdings percentage
asset_dict = {}
for row in table.select('tr')[1:-1]:
try:
cells = row.select('td')
# print(row)
symbol = cells[0].get_text().strip()
# print(symbol)
name = cells[1].text.strip()
celltext = cells[2].get_text().strip()
percent = float(celltext.rstrip('%'))
shares = int(cells[3].text.strip().replace(',', ''))
if symbol != "" and percent != 0.0:
asset_dict[symbol] = {
'name': name,
'percent': percent,
'shares': shares,
}
except BaseException as ex:
print(ex)
browser.quit()
return pd.DataFrame(asset_dict)
@lucianin
Copy link

I am unable to use this, and the problem seems to be with get_table. Where is that defined?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment