Skip to content

Instantly share code, notes, and snippets.

@kadnan
Created September 23, 2016 05:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kadnan/277d1fa84327710fb82d530f20457b3d to your computer and use it in GitHub Desktop.
Save kadnan/277d1fa84327710fb82d530f20457b3d to your computer and use it in GitHub Desktop.
Fetching, parsing and cleaning Pakistan England Test data from Cricinfo
"""
This will grab the data from CricInfo Site about TestMatch Played by Pakistan against England from 1954-till now
"""
import requests
from bs4 import BeautifulSoup
url = 'http://stats.espncricinfo.com/ci/engine/team/7.html?class=1;opposition=1;template=results;type=team;view=results'
r = requests.get(url)
html = r.text
#create soup object
soup = BeautifulSoup(html,'lxml')
recs = soup.select('tbody > .data1')
file = open('data_england_test.csv', "w")
for i in range(2,len(recs)):
single = recs[i].findAll('td')
file.write(single[0].text + ','+single[1].text+ ','+single[2].text+ ','+single[3].text+ ','+ single[4].text+ ','+single[6].text+ ','+ single[7].text+'\n')
file.close()
@bsandyy
Copy link

bsandyy commented Mar 24, 2017

I think at line 18, the index you are looking for is single[5] and not single[4] as it is empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment