Skip to content

Instantly share code, notes, and snippets.

@Eezzeldin
Forked from hybridjosto/Cricket data.py
Created July 19, 2016 17:36
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Eezzeldin/88472c6af60ff51e252e822252f691a6 to your computer and use it in GitHub Desktop.
Save Eezzeldin/88472c6af60ff51e252e822252f691a6 to your computer and use it in GitHub Desktop.
scraping data from a web table using python and Beautiful Soup
import urllib2
from bs4 import BeautifulSoup
# http://segfault.in/2010/07/parsing-html-table-in-python-with-beautifulsoup/
f = open('cricket-data.txt','w')
linksFile = open("linksSource.txt")
lines = list(linksFile.readlines())
for i in lines[12:108]: #12:108
url = "http://www.gunnercricket.com/"+str(i)
try:
page = urllib2.urlopen(url)
except:
continue
soup = BeautifulSoup(page)
title = soup.title
date = title.string[:4]+',' #take first 4 characters from title
try:
table = soup.find('table')
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
text_data = []
for td in cols:
text = ''.join(td)
utftext = str(text.encode('utf-8'))
text_data.append(utftext) # EDIT
text = date+','.join(text_data)
f.write(text + '\n')
except:
pass
f.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment