Skip to content

Instantly share code, notes, and snippets.

@nicholasserra
Created August 26, 2019 02:30
Show Gist options
  • Save nicholasserra/dcc13f29e7df14d51a5863427410fb8f to your computer and use it in GitHub Desktop.
Save nicholasserra/dcc13f29e7df14d51a5863427410fb8f to your computer and use it in GitHub Desktop.
Crawl nhl videos
#!/usr/bin/env python
import json
import subprocess
try:
from urllib.request import Request, urlopen
except ImportError:
from urllib2 import Request, urlopen
page = 0
count = 0
while True:
page += 1
req = Request('https://search-api.svc.nhl.com/svc/search/v2/nhl_ca_en/topic/277437418/team/1?page={}&sort=new&type=video'.format(page))
req.add_header('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0')
response = urlopen(req)
payload = json.loads(response.read())
videos = payload['docs']
if not videos:
break
for video in videos:
count += 1
url = video['url']
subprocess.call(['youtube-dl', url])
print('{} crawled'.format(count))
print('Done.')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment