Last active
January 24, 2016 13:19
-
-
Save tangotiger/eab8c2fce352be3f27c6 to your computer and use it in GitHub Desktop.
Navigating Table with BS4
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import urllib.request | |
from bs4 import BeautifulSoup | |
print("Parse start") | |
sourcefile = "file:///C:/Users/TOM/PycharmProjects/downloadNHL/datafiles/RO020666_partial.HTM" | |
html = urllib.request.urlopen(sourcefile) | |
soup = BeautifulSoup(html, "lxml") | |
tableRow = soup.findAll('tr') | |
for row in tableRow: | |
print(row.name) | |
tableRowData = row.findAll('td') | |
print('x') | |
print("Parse end") | |
Also where did the file ...datafiles/RO020666_partial.HTM come from? I have the file ...datafiles/RO020666.HTM
I downloaded Anaconda, which comes with Python, as well as a ton of libraries. And I use PyCharm as my IDE.
The "partial" is my own file, which is simply a subset of the original file (some 50 lines), so I can follow along with what it's doing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Getting this error
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
I'm new to python and trying to work my way through this. Are you using Python2 or 3?