Last active
February 7, 2016 21:24
-
-
Save tangotiger/4fec9a63b2cb4692ecb9 to your computer and use it in GitHub Desktop.
Parse Schedule
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
print("Parse start") | |
sourcefile = "C:\\Users\\TOM\\PycharmProjects\\downloadNHL\\datafiles\\schedulebyseason.htm" | |
targetfile = "C:\\Users\\TOM\\PycharmProjects\\downloadNHL\\datafiles\\parsed_schedulebyseason.txt" | |
searchstr = "recap?id=" | |
sample_recstr = "2015020001" | |
reclen = len(sample_recstr) | |
i = 0 | |
with open(sourcefile,'r') as infile, open(targetfile,'w') as outfile: | |
for line in infile: | |
line_iterator = str(line).split(searchstr) | |
if len(line_iterator) > 1: | |
game_id = line_iterator[1][0:reclen] | |
outfile.write(game_id) | |
outfile.write("\n") | |
i = i + 1 | |
print(str(i) + " : records written") | |
print("Parse end") | |
I'm fascinated. Thanks guys, I'm going to try these solutions as well.
Someone pointed to here:
https://gist.github.com/Ja1meMartin/db1b71ed90921aff24fa
And I made my updates accordingly.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's how to do it in Beautiful Soup: