Skip to content

Instantly share code, notes, and snippets.

@tangotiger
Last active January 24, 2016 13:19
Show Gist options
  • Save tangotiger/eab8c2fce352be3f27c6 to your computer and use it in GitHub Desktop.
Save tangotiger/eab8c2fce352be3f27c6 to your computer and use it in GitHub Desktop.
Navigating Table with BS4
import urllib.request
from bs4 import BeautifulSoup
print("Parse start")
sourcefile = "file:///C:/Users/TOM/PycharmProjects/downloadNHL/datafiles/RO020666_partial.HTM"
html = urllib.request.urlopen(sourcefile)
soup = BeautifulSoup(html, "lxml")
tableRow = soup.findAll('tr')
for row in tableRow:
print(row.name)
tableRowData = row.findAll('td')
print('x')
print("Parse end")
@tangotiger
Copy link
Author

Ugh. Typo: findall instead of findAll

@oc99
Copy link

oc99 commented Jan 24, 2016

Getting this error

bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

I'm new to python and trying to work my way through this. Are you using Python2 or 3?

@oc99
Copy link

oc99 commented Jan 24, 2016

Also where did the file ...datafiles/RO020666_partial.HTM come from? I have the file ...datafiles/RO020666.HTM

@tangotiger
Copy link
Author

I downloaded Anaconda, which comes with Python, as well as a ton of libraries. And I use PyCharm as my IDE.

The "partial" is my own file, which is simply a subset of the original file (some 50 lines), so I can follow along with what it's doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment