Skip to content

Instantly share code, notes, and snippets.

@BindiChen
Created November 27, 2020 00:23
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save BindiChen/2353802c54eaa72f81d852bd39126b3a to your computer and use it in GitHub Desktop.
Save BindiChen/2353802c54eaa72f81d852bd39126b3a to your computer and use it in GitHub Desktop.
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
# Step 1: Create a session and load the page
driver = webdriver.Chrome()
driver.get('https://pubs.rsc.org/en/content/articlelanding/2020/na/d0na00118j')
# Wait for the page to fully load
driver.implicitly_wait(5)
# Step 2: Parse HTML code and grab tables with Beautiful Soup
soup = BeautifulSoup(driver.page_source, 'lxml')
tables = soup.find_all('table')
# Step 3: Read tables with Pandas read_html()
dfs = pd.read_html(str(tables))
print(f'Total tables: {len(dfs)}')
print(dfs[0])
driver.close()
@omivalera
Copy link

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment