Skip to content

Instantly share code, notes, and snippets.

@Ethcelon
Last active August 29, 2015 14:07
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Ethcelon/a5748b34f3529d672b0d to your computer and use it in GitHub Desktop.
Save Ethcelon/a5748b34f3529d672b0d to your computer and use it in GitHub Desktop.
# Scraping a recipie on www.sanjeevkapoor.com
import BeautifulSoup as bs
import requests
r = requests.get("http://www.sanjeevkapoor.com/Recipe/Chocolate-Mille-Feuille.html")
data = r.text
soup = bs.BeautifulSoup(data)
# Now we have the HTML as a soup
# http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html
# We have to parse the HTML to get the required
# Getting the instructions
a = soup.findAll('span', { "itemprop" : "recipeInstructions" })
instructions = [ x.text.encode('utf-8') for x in a ]
# Getting the ingredients
# The ingredients and quantities are in <div class="ingrcol1"> and <div class="ingrcol2">
p = [ x.text.encode('utf-8') for x in soup.findAll("div", { "class" : "ingrcol1" }) ] # ingredients
q = [ x.text.encode('utf-8') for x in soup.findAll("div", { "class" : "ingrcol2" }) ] # quantities
ingredients = zip(p, q)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment