Last active
January 24, 2016 13:19
-
-
Save tangotiger/eab8c2fce352be3f27c6 to your computer and use it in GitHub Desktop.
Navigating Table with BS4
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import urllib.request | |
from bs4 import BeautifulSoup | |
print("Parse start") | |
sourcefile = "file:///C:/Users/TOM/PycharmProjects/downloadNHL/datafiles/RO020666_partial.HTM" | |
html = urllib.request.urlopen(sourcefile) | |
soup = BeautifulSoup(html, "lxml") | |
tableRow = soup.findAll('tr') | |
for row in tableRow: | |
print(row.name) | |
tableRowData = row.findAll('td') | |
print('x') | |
print("Parse end") | |
I downloaded Anaconda, which comes with Python, as well as a ton of libraries. And I use PyCharm as my IDE.
The "partial" is my own file, which is simply a subset of the original file (some 50 lines), so I can follow along with what it's doing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Also where did the file ...datafiles/RO020666_partial.HTM come from? I have the file ...datafiles/RO020666.HTM