Skip to content

Instantly share code, notes, and snippets.

@donlovett
Created September 17, 2016 22:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save donlovett/103e9e9e8497bc3113f6a3d26fd8bc2c to your computer and use it in GitHub Desktop.
Save donlovett/103e9e9e8497bc3113f6a3d26fd8bc2c to your computer and use it in GitHub Desktop.
Read a webfile and sum the numbers in the table
# Note - this code must run in Python 2.x and you must download
# http://www.pythonlearn.com/code/BeautifulSoup.py
# Into the same folder as this program
# http://www.dr-chuck.com/page1.htm
# Week 4 Assignment Sample is defined
# Author Don lovett Based on Code from Using Python to Access Web Data text by Charles Severance
import re
import urllib
from BeautifulSoup import *
#url = raw_input('Enter - ')
url = 'http://python-data.dr-chuck.net/comments_42.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
nums = list()
# Retrieve all of the anchor tags
tags = soup('span')
for tag in tags:
# Look at the parts of a tag
#print 'TAG:',tag
#print 'URL:',tag.get('href', None)
print 'Contents:',tag.contents[0]
x = re.findall('([0-9]+)', tag.contents[0])
if len(x) > 0 :
val = int(x[0])
nums.append(val)
print len(nums)
print sum(nums)
#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment