Created
May 12, 2016 23:52
-
-
Save carlward/4a87fc6f94f16d7b11753a8f65da9572 to your computer and use it in GitHub Desktop.
Parsing HTML and XML Webcast
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Parsing HTML and XML" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import pandas as pd" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## What is a Parser?" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"A parser reads a series of instructions and breaks them into component parts, then structures them." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# HTML" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"HTML is the language of the internet. It is used to define the structure and content of a webpage. For our purposes we are only interested in how to programmatically extract data from it." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### HTML is made up of elements\n", | |
"Elements are defined by tags\n", | |
"- elements can contain content \n", | |
"```HTML\n", | |
"<tag>content</tag>\n", | |
"```\n", | |
"- elements can contain other elements\n", | |
"```HTML\n", | |
"<tag>\n", | |
" <sub_tag>sub tag 1</sub_tag>\n", | |
" <sub_tag>sub tag 2</sub_tag>\n", | |
"</tag>\n", | |
"```\n", | |
"- element tags can have attributes \n", | |
"```HTML\n", | |
"<tag id=\"tag_id\", style=\"visibility: hidden;\">\n", | |
"...\n", | |
"</tag>\n", | |
"```\n", | |
"- elements either have an opening and closing tag \n", | |
"```HTML\n", | |
"<tag>content</tag>\n", | |
"```\n", | |
" or are self closing\n", | |
"```HTML\n", | |
"<tag />\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### Example HTML Code" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"```HTML\n", | |
"<!DOCTYPE html>\n", | |
"<html lang=\"en\">\n", | |
" <head>\n", | |
" <meta charset=\"utf-8\"/>\n", | |
" <title>\n", | |
" Soup Title\n", | |
" </title>\n", | |
" <meta content=\"Soup\" name=\"description\"/>\n", | |
" <meta content=\"Soupy Soup\" name=\"author\"/>\n", | |
" <style type=\"text/css\">\n", | |
" .tg {border-collapse:collapse;border-spacing:0;border-color:#999;border-width:1px;border-style:solid;margin:0px auto;}\n", | |
" .tg td {font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#999;color:#444;background-color:#F7FDFA;}\n", | |
" .tg th {font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#999;color:#fff;background-color:#26ADE4;}\n", | |
" .tg .tg-vn4c {background-color:#D2E4FC}\n", | |
" </style>\n", | |
" </head>\n", | |
" <body>\n", | |
" <table class=\"tg\">\n", | |
" <tr>\n", | |
" <th class=\"tg-031e\">\n", | |
" Soup\n", | |
" </th>\n", | |
" <th class=\"tg-031e\">\n", | |
" Price\n", | |
" </th>\n", | |
" <th class=\"tg-031e\">\n", | |
" Weight\n", | |
" </th>\n", | |
" <th class=\"tg-031e\">\n", | |
" Rating\n", | |
" </th>\n", | |
" <th class=\"tg-031e\">\n", | |
" Reviews\n", | |
" </th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" Amy's Kitchen Organic Gluten Free Low Fat Chunky Tomato Soup\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 1.7\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" .100\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 5\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 4\n", | |
" </td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-031e\">\n", | |
" Heinz Classic Cream of Tomato Soup\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" .95\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" .400\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" 5\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" 14\n", | |
" </td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" Baxters Favourites Cream of Tomato Soup\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 1.15\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" .400\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 2\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 1\n", | |
" </td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-031e\">\n", | |
" Cross & Blackwell Cream of Tomato Soup\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" 2.00\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" 4 x .400\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" 5\n", | |
" </td>\n", | |
" <td class=\"tg-031e\">\n", | |
" 2\n", | |
" </td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" Morrisons Cream of Tomato Soup\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" .45\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" .100\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 4\n", | |
" </td>\n", | |
" <td class=\"tg-vn4c\">\n", | |
" 1\n", | |
" </td>\n", | |
" </tr>\n", | |
" </table>\n", | |
" </body>\n", | |
"</html>\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"<html lang=\"en\">\n", | |
"<head>\n", | |
" <meta charset=\"utf-8\">\n", | |
"\n", | |
" <title>The HTML5 Herald</title>\n", | |
" <meta name=\"description\" content=\"Soup\">\n", | |
" <meta name=\"author\" content=\"Soupy Soup\">\n", | |
"\n", | |
" <style type=\"text/css\">\n", | |
" .tg {border-collapse:collapse;border-spacing:0;border-color:#999;border-width:1px;border-style:solid;margin:0px auto;}\n", | |
" .tg td {font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#999;color:#444;background-color:#F7FDFA;}\n", | |
" .tg th {font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#999;color:#fff;background-color:#26ADE4;}\n", | |
" .tg .tg-vn4c {background-color:#D2E4FC}\n", | |
" </style>\n", | |
"\n", | |
"</head>\n", | |
"\n", | |
"<body>\n", | |
"\n", | |
"<table class=\"tg\">\n", | |
" <tr>\n", | |
" <th class=\"tg-031e\">Soup</th>\n", | |
" <th class=\"tg-031e\">Price</th>\n", | |
" <th class=\"tg-031e\">Weight</th>\n", | |
" <th class=\"tg-031e\">Rating</th>\n", | |
" <th class=\"tg-031e\">Reviews</th>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-vn4c\">Amy's Kitchen Organic Gluten Free Low Fat Chunky Tomato Soup</td>\n", | |
" <td class=\"tg-vn4c\">1.7</td>\n", | |
" <td class=\"tg-vn4c\">.100</td>\n", | |
" <td class=\"tg-vn4c\">5</td>\n", | |
" <td class=\"tg-vn4c\">4</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-031e\">Heinz Classic Cream of Tomato Soup</td>\n", | |
" <td class=\"tg-031e\">.95</td>\n", | |
" <td class=\"tg-031e\">.400</td>\n", | |
" <td class=\"tg-031e\">5</td>\n", | |
" <td class=\"tg-031e\">14</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-vn4c\">Baxters Favourites Cream of Tomato Soup</td>\n", | |
" <td class=\"tg-vn4c\">1.15</td>\n", | |
" <td class=\"tg-vn4c\">.400</td>\n", | |
" <td class=\"tg-vn4c\">2</td>\n", | |
" <td class=\"tg-vn4c\">1</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-031e\">Cross & Blackwell Cream of Tomato Soup</td>\n", | |
" <td class=\"tg-031e\">2.00</td>\n", | |
" <td class=\"tg-031e\">4 x .400</td>\n", | |
" <td class=\"tg-031e\">5</td>\n", | |
" <td class=\"tg-031e\">2</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <td class=\"tg-vn4c\">Morrisons Cream of Tomato Soup</td>\n", | |
" <td class=\"tg-vn4c\">.45</td>\n", | |
" <td class=\"tg-vn4c\">.100</td>\n", | |
" <td class=\"tg-vn4c\">4</td>\n", | |
" <td class=\"tg-vn4c\">1</td>\n", | |
" </tr>\n", | |
"</table>\n", | |
"\n", | |
"\n", | |
"</body>\n", | |
"</html>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Parsing with Beautiful Soup" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Beautiful soup is library of convenience functions that together with a parser will allow you manipulate HMTL or XML code. Lets parse this table of soup with BeautifulSoup" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from bs4 import BeautifulSoup\n", | |
"\n", | |
"with open('soup.html', 'rb') as html_file:\n", | |
" soup = BeautifulSoup(html_file, 'lxml') # Specify that we want to use the lxml parser" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"print soup.prettify()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"print \"The head tag:\"\n", | |
"print soup.find('head').prettify()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# Different ways to select elements\n", | |
"print soup.find('head').find('title')\n", | |
"print soup.find('title')\n", | |
"print soup.title.get_text()\n", | |
"title = soup.find('title')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"title.attrs" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Select the table" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"tables = soup.find_all('table')\n", | |
"for table in tables:\n", | |
" print \"Attributes:\", table.attrs\n", | |
" print \"Number of rows:\", len(table.find_all('tr')) # table row tags\n", | |
" print \"Number of cells:\", len(table.find_all('td')) # table cells " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"table = soup.find('table', attrs={'class': ['tg']})" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"table" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Convert the table data into standard Python data structures" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"price.name" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"soup_list = []\n", | |
"rows = table.find_all('tr')\n", | |
"header_row = rows.pop(0)\n", | |
"\n", | |
"# Loop through each row and extract cells\n", | |
"for row in rows:\n", | |
" name, price, weight, rating, reviews = row.find_all('td')\n", | |
" soup_list.append({\n", | |
" 'name': name.text,\n", | |
" 'price': price.text,\n", | |
" 'weight': weight.text,\n", | |
" 'rating': rating.text,\n", | |
" 'reviews': reviews.text \n", | |
" })\n", | |
" \n", | |
"soup_list" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Convert to a Pandas DataFrame" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"df = pd.DataFrame(soup_list)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# XML" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"XML is the language of data on the internet (sort of)\n", | |
"\n", | |
"XML is just like HTML except there are no rules about tag types!" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Sample XML Code" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"```XML\n", | |
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n", | |
"<soupdata>\n", | |
" <soup>\n", | |
" <name>Amy's Kitchen Organic Gluten Free Low Fat Chunky Tomato Soup</name>\n", | |
" <price currency=\"GBP\">1.7</price>\n", | |
" <weight units='g'>.100</weight>\n", | |
" <rating>5</rating>\n", | |
" <reviews>4</reviews>\n", | |
" </soup>\n", | |
" <soup>\n", | |
" <name>Heinz Classic Cream of Tomato Soup</name>\n", | |
" <price currency=\"GBP\">.95</price>\n", | |
" <weight units='g'>.400</weight>\n", | |
" <rating>5</rating>\n", | |
" <reviews>14</reviews>\n", | |
" </soup>\n", | |
" <soup>\n", | |
" <name>Baxters Favourites Cream of Tomato Soup</name>\n", | |
" <price currency=\"GBP\">1.15</price>\n", | |
" <weight units='g'>.400</weight>\n", | |
" <rating>2</rating>\n", | |
" <reviews>1</reviews>\n", | |
" </soup>\n", | |
" <soup>\n", | |
" <name>Cross & Blackwell Cream of Tomato Soup</name>\n", | |
" <price currency=\"GBP\">2.00</price>\n", | |
" <weight units='g'>4 x .400</weight>\n", | |
" <rating>5</rating>\n", | |
" <reviews>2</reviews>\n", | |
" </soup>\n", | |
" <soup>\n", | |
" <name>Morrisons Cream of Tomato Soup</name>\n", | |
" <price currency=\"GBP\">.45</price>\n", | |
" <weight units='g'>.100</weight>\n", | |
" <rating>4</rating>\n", | |
" <reviews>1</reviews>\n", | |
" </soup>\n", | |
"</soupdata>\n", | |
"```" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Typical XML Parsing" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import xml.etree.cElementTree as ET\n", | |
"\n", | |
"with open('soup.xml', 'r') as f_in:\n", | |
" tree = ET.parse(f_in)\n", | |
" root = tree.getroot()\n", | |
" print \"Tag: %s\" % root.tag\n", | |
" print \"Attributes: %s\" % root.attrib" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"for tag in root.findall('soup'):\n", | |
" print tag.tag\n", | |
" for child_tag in tag.iter():\n", | |
" print \"tag: {0}\\t attributes: {1}\".format(child_tag.tag, child_tag.attrib)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"with open('soup.xml', 'r') as f_in:\n", | |
" tree = ET.parse(f_in)\n", | |
" root = tree.getroot()\n", | |
" soups = []\n", | |
" for tag in root.findall('soup'):\n", | |
" soups.append({child_tag.tag: child_tag.text for child_tag in tag})" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"pd.DataFrame(soups)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Parsing iteratively" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Sometimes an XML document is too big parse at once. In these situations we need to parse iteratively." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import xml.etree.cElementTree as ET\n", | |
"\n", | |
"with open('soup.xml', 'r') as f_in:\n", | |
" for event, elem in ET.iterparse(f_in, events = (\"start\", \"end\")):\n", | |
" print \"event type: {}\\t element: {}\".format(event, elem)\n", | |
" \n", | |
" # Print some details about elements with tag 'soup'\n", | |
" if elem.tag == 'soup' and event == \"end\":\n", | |
" for field in ['name', 'price', 'weight', 'rating', 'reviews']:\n", | |
" value = elem.find(field).text\n", | |
" attribs = elem.find(field).attrib\n", | |
" print \"\\t{}: {} attribs: {}\".format(field, value, attribs)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"with open('soup.xml', 'r') as f_in:\n", | |
" soups = []\n", | |
" for event, elem in ET.iterparse(f_in, events = (\"start\", \"end\")):\n", | |
" if elem.tag == 'soup' and event == \"end\":\n", | |
" soups.append({child.tag: child.text for child in elem})" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"pd.DataFrame(soups)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.11" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<!DOCTYPE html> | |
<html lang="en"> | |
<head> | |
<meta charset="utf-8"> | |
<title>Soup Title</title> | |
<meta name="description" content="Soup"> | |
<meta name="author" content="Soupy Soup"> | |
<style type="text/css"> | |
.tg {border-collapse:collapse;border-spacing:0;border-color:#999;border-width:1px;border-style:solid;margin:0px auto;} | |
.tg td {font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#999;color:#444;background-color:#F7FDFA;} | |
.tg th {font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#999;color:#fff;background-color:#26ADE4;} | |
.tg .tg-vn4c {background-color:#D2E4FC} | |
</style> | |
</head> | |
<body> | |
<table class="tg"> | |
<tr> | |
<th class="tg-031e">Soup</th> | |
<th class="tg-031e">Price</th> | |
<th class="tg-031e">Weight</th> | |
<th class="tg-031e">Rating</th> | |
<th class="tg-031e">Reviews</th> | |
</tr> | |
<tr> | |
<td class="tg-vn4c">Amy's Kitchen Organic Gluten Free Low Fat Chunky Tomato Soup</td> | |
<td class="tg-vn4c">1.7</td> | |
<td class="tg-vn4c">.100</td> | |
<td class="tg-vn4c">5</td> | |
<td class="tg-vn4c">4</td> | |
</tr> | |
<tr> | |
<td class="tg-031e">Heinz Classic Cream of Tomato Soup</td> | |
<td class="tg-031e">.95</td> | |
<td class="tg-031e">.400</td> | |
<td class="tg-031e">5</td> | |
<td class="tg-031e">14</td> | |
</tr> | |
<tr> | |
<td class="tg-vn4c">Baxters Favourites Cream of Tomato Soup</td> | |
<td class="tg-vn4c">1.15</td> | |
<td class="tg-vn4c">.400</td> | |
<td class="tg-vn4c">2</td> | |
<td class="tg-vn4c">1</td> | |
</tr> | |
<tr> | |
<td class="tg-031e">Cross & Blackwell Cream of Tomato Soup</td> | |
<td class="tg-031e">2.00</td> | |
<td class="tg-031e">4 x .400</td> | |
<td class="tg-031e">5</td> | |
<td class="tg-031e">2</td> | |
</tr> | |
<tr> | |
<td class="tg-vn4c">Morrisons Cream of Tomato Soup</td> | |
<td class="tg-vn4c">.45</td> | |
<td class="tg-vn4c">.100</td> | |
<td class="tg-vn4c">4</td> | |
<td class="tg-vn4c">1</td> | |
</tr> | |
</table> | |
</body> | |
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<soupdata> | |
<soup> | |
<name>Amy's Kitchen Organic Gluten Free Low Fat Chunky Tomato Soup</name> | |
<price currency="GBP">1.7</price> | |
<weight units='g'>.100</weight> | |
<rating>5</rating> | |
<reviews>4</reviews> | |
</soup> | |
<soup> | |
<name>Heinz Classic Cream of Tomato Soup</name> | |
<price currency="GBP">.95</price> | |
<weight units='g'>.400</weight> | |
<rating>5</rating> | |
<reviews>14</reviews> | |
</soup> | |
<soup> | |
<name>Baxters Favourites Cream of Tomato Soup</name> | |
<price currency="GBP">1.15</price> | |
<weight units='g'>.400</weight> | |
<rating>2</rating> | |
<reviews>1</reviews> | |
</soup> | |
<soup> | |
<name>Cross & Blackwell Cream of Tomato Soup</name> | |
<price currency="GBP">2.00</price> | |
<weight units='g'>4 x .400</weight> | |
<rating>5</rating> | |
<reviews>2</reviews> | |
</soup> | |
<soup> | |
<name>Morrisons Cream of Tomato Soup</name> | |
<price currency="GBP">.45</price> | |
<weight units='g'>.100</weight> | |
<rating>4</rating> | |
<reviews>1</reviews> | |
</soup> | |
</soupdata> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment