gayanvirajith/python-beatifulsoup.py

## readme.md

      
    Raw
  

              readme.md
            
          
    How to run


Go to the folder when above files are located.
Create python3 virual environment by having python3 -m venv .venv
Upgrade your pip version by having pip install --upgrade pip
Install dependencies by having pip install -r requirements.txt

Credits/Referemces


https://www.dataquest.io/blog/web-scraping-python-using-beautiful-soup/
https://stackoverflow.com/questions/42589738/get-column-from-a-table-with-python-and-beautiful-soup
https://stackabuse.com/reading-and-writing-csv-files-in-python-with-pandas/
https://www.crummy.com/software/BeautifulSoup/bs4/doc/


## python-beatifulsoup.py
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

html = urlopen("http://localhost:8080")
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")

files = {'path': []}

for row in table.find_all("tr")[1:]:

    col = row.find_all("td")
    if "common" in col[0].string:
        files['path'].append(col[0].string)

pd.DataFrame(files).T.reset_index().to_csv('values-on-column.csv', header=False, index=False)

## requirements.txt
beautifulsoup4==4.9.3
certifi==2021.5.30
charset-normalizer==2.0.4
idna==3.2
numpy==1.21.2
pandas==1.3.2
python-dateutil==2.8.2
pytz==2021.1
requests==2.26.0
six==1.16.0
soupsieve==2.2.1
urllib3==1.26.6
	from urllib.request import urlopen
	from bs4 import BeautifulSoup
	import pandas as pd

	html = urlopen("http://localhost:8080")
	soup = BeautifulSoup(html, "html.parser")
	table = soup.find("table")

	files = {'path': []}

	for row in table.find_all("tr")[1:]:

	col = row.find_all("td")
	if "common" in col[0].string:
	files['path'].append(col[0].string)

	pd.DataFrame(files).T.reset_index().to_csv('values-on-column.csv', header=False, index=False)
	beautifulsoup4==4.9.3
	certifi==2021.5.30
	charset-normalizer==2.0.4
	idna==3.2
	numpy==1.21.2
	pandas==1.3.2
	python-dateutil==2.8.2
	pytz==2021.1
	requests==2.26.0
	six==1.16.0
	soupsieve==2.2.1
	urllib3==1.26.6