Skip to content

Instantly share code, notes, and snippets.

@gayanvirajith
Last active August 17, 2021 07:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gayanvirajith/043d512b68e589ab3c623d4f97bfacb8 to your computer and use it in GitHub Desktop.
Save gayanvirajith/043d512b68e589ab3c623d4f97bfacb8 to your computer and use it in GitHub Desktop.
Extract html table column values using python 3 and beautifulsoup4

How to run

  • Go to the folder when above files are located.
  • Create python3 virual environment by having python3 -m venv .venv
  • Upgrade your pip version by having pip install --upgrade pip
  • Install dependencies by having pip install -r requirements.txt

Credits/Referemces

from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
html = urlopen("http://localhost:8080")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")
files = {'path': []}
for row in table.find_all("tr")[1:]:
col = row.find_all("td")
if "common" in col[0].string:
files['path'].append(col[0].string)
pd.DataFrame(files).T.reset_index().to_csv('values-on-column.csv', header=False, index=False)
beautifulsoup4==4.9.3
certifi==2021.5.30
charset-normalizer==2.0.4
idna==3.2
numpy==1.21.2
pandas==1.3.2
python-dateutil==2.8.2
pytz==2021.1
requests==2.26.0
six==1.16.0
soupsieve==2.2.1
urllib3==1.26.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment