Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save rhiever/8411589 to your computer and use it in GitHub Desktop.
Save rhiever/8411589 to your computer and use it in GitHub Desktop.
Quick and dirty web scraping with Python
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Quick and dirty web scraping with Python"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import urllib2\n",
"from html2text import html2text\n",
"\n",
"for line in html2text(urllib2.urlopen(\"http://moviebodycounts.com/Blood_Diamond.htm\").read()).split(\"\\n\"):\n",
" if \"IMDb\" in line:\n",
" print line.split(\"[IMDb]\")[1].strip(\"(\").strip(\",\").strip(\")\")\n",
" if \"Film:\" in line:\n",
" print line.split(\"Film:\")[-1].strip()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"http://www.imdb.com/title/tt0450259/\n",
"187\n"
]
}
],
"prompt_number": 1
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment