Skip to content

Instantly share code, notes, and snippets.

@tangotiger
Last active February 7, 2016 21:26
Show Gist options
  • Save tangotiger/e0d9def163faf964bf60 to your computer and use it in GitHub Desktop.
Save tangotiger/e0d9def163faf964bf60 to your computer and use it in GitHub Desktop.
Download NHL files
import urllib.request
# season_id = '20152016'
# subseason_id = '02'
#
# first_game_id = 1
# last_game_id = 669
def retrieve_file(datafile_id, season_id='20152016', subseason_id='02', first_game_id=1, last_game_id=669 ):
for int_game_id in range(first_game_id, last_game_id):
game_id = str(int_game_id).zfill(4)
print(game_id)
sourcefile = 'http://www.nhl.com/scores/htmlreports/{s}/{d}{ss}{g}.HTM'.format(s=season_id, d=datafile_id, ss=subseason_id, g=game_id)
targetfile = "C:\\Users\\TOM\\PycharmProjects\\downloadNHL\\datafiles\\{d}{ss}{g}.HTM".format(d=datafile_id, ss=subseason_id, g=game_id)
urllib.request.urlretrieve(sourcefile, targetfile)
return
if __name__ == "__main__":
retrieve_file('PL')
retrieve_file('RO')
retrieve_file('TH')
retrieve_file('TV')
@jhweaver
Copy link

I'd make a couple more small changes related to usability. There are implemented here https://gist.github.com/jhweaver/7db17089802f1cb122d7

  1. Move the four global vars as optional arguments for retrieve_file. This way if you ever wanted to run retrieve file for a subset of these (say you found that a stretch of game ids you downloaded was corrupted), you could easily download just that subset by calling the retrieve_file method with that set of parameters. You can still just call retrieve_file('PL') without specifying the optional args, and get the default values just like you have now.
  2. Put the part that you actually want to run via the command line in a if __name__ == "__main__": block. Right now if you import downloadEvents in a python REPL, it will define the retrieve_file function, then it will actually run all four commands at the bottom of the file. Adding this if block lets the interpreter distinguish between when you're running this script from the command line (in which case you do want to run those four commands) and when you're importing it as a module (when you're probably just looking for access to methods defined in the module).
  3. I'd rename the module to download_events.py. This conforms to python's PEP8 style guide (https://www.python.org/dev/peps/pep-0008/#package-and-module-names). I know a lot of people see this a nitpicking, but it's going to be a headache down the road when you're dealing with several packages. If some of those packages are camel_case and some of them are alternatingUpperCase, it'll be tough to remember which is which.

@tangotiger
Copy link
Author

Terrific suggestions! I'll implement now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment