Skip to content

Instantly share code, notes, and snippets.

@tangotiger
Last active February 7, 2016 21:24
Show Gist options
  • Save tangotiger/4fec9a63b2cb4692ecb9 to your computer and use it in GitHub Desktop.
Save tangotiger/4fec9a63b2cb4692ecb9 to your computer and use it in GitHub Desktop.
Parse Schedule
print("Parse start")
sourcefile = "C:\\Users\\TOM\\PycharmProjects\\downloadNHL\\datafiles\\schedulebyseason.htm"
targetfile = "C:\\Users\\TOM\\PycharmProjects\\downloadNHL\\datafiles\\parsed_schedulebyseason.txt"
searchstr = "recap?id="
sample_recstr = "2015020001"
reclen = len(sample_recstr)
i = 0
with open(sourcefile,'r') as infile, open(targetfile,'w') as outfile:
for line in infile:
line_iterator = str(line).split(searchstr)
if len(line_iterator) > 1:
game_id = line_iterator[1][0:reclen]
outfile.write(game_id)
outfile.write("\n")
i = i + 1
print(str(i) + " : records written")
print("Parse end")
@tangotiger
Copy link
Author

Terrific stuff guys, thanks. This will definitely be especially valuable in my next parsing of a file.
Here's an alternative also provided:
https://gist.github.com/Ja1meMartin/db1b71ed90921aff24fa

@dondrake
Copy link

Here's how to do it in Beautiful Soup:

from bs4 import BeautifulSoup
import urllib2
import re

resp = urllib2.urlopen("http://www.nhl.com/ice/schedulebyseason.htm")
soup = BeautifulSoup(resp, "html.parser", from_encoding=resp.info().getparam('charset') )

with open('game-ids.txt', 'w') as output:
    for link in soup.find_all('a', href=True):
        #print link['href']
        # http://www.nhl.com/gamecenter/en/recap?id=2015020664
        result = re.search(r'recap\?id=(\d+)', link['href'])
        if result:
            output.write('%s\n' % result.group(1))

@tangotiger
Copy link
Author

I'm fascinated. Thanks guys, I'm going to try these solutions as well.

@tangotiger
Copy link
Author

Someone pointed to here:
https://gist.github.com/Ja1meMartin/db1b71ed90921aff24fa
And I made my updates accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment