Skip to content

Instantly share code, notes, and snippets.

@rcalsaverini
Last active August 22, 2017 17:43
Show Gist options
  • Save rcalsaverini/7187512 to your computer and use it in GitHub Desktop.
Save rcalsaverini/7187512 to your computer and use it in GitHub Desktop.
Crawl Futpedia for brazilian soccer data.
"""
Crawling brazilian soccer results from
http://futpedia.globo.com/campeonato/campeonato-brasileiro/2011#/fase=fase-unica/rodada=1
With:
http://www.clips.ua.ac.be/pages/pattern-web
"""
import pandas
from pattern import web
import parse
def get_url(ano, rodada):
string = 'http://futpedia.globo.com/campeonato/campeonato-brasileiro/{}#/fase=fase-unica/rodada={}'
return web.URL(string.format(ano, rodada), unicode=True, method='GET')
def iter_games():
for year in range(2011, 2012, 1):
dom = web.DOM(web.download(get_url(year, 1), cached=True))
for e in dom('li.lista-classificacao-jogo'):
rodada = e.attrs['data-rodada']
date = [item.attrs['datetime'] for item in e('*[itemprop="startDate"')][0]
time = [item.content for item in e('span.horario')][0]
home_team = e('div[class="info-jogo" div[class="time mandante" meta[itemprop="name"]')
yield rodada, date, time, home_team
for a in iter_games():
print a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment