Skip to content

Instantly share code, notes, and snippets.

@hannesdatta
Created September 20, 2022 13:58
Show Gist options
  • Save hannesdatta/0558162700a742b29251a77db134cb7e to your computer and use it in GitHub Desktop.
Save hannesdatta/0558162700a742b29251a77db134cb7e to your computer and use it in GitHub Desktop.
Getting product descriptions and unique product category links from books.toscrape.com
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to capture the product description at books.toscrape?\n"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"# Let's load in the site first...\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"\n",
"req= requests.get('https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html')\n",
"# ...and convert it to a BeautifulSoup object\n",
"soup = BeautifulSoup(req.text)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next cell will get you the title of the product description (which is... \"product description\")\n"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<div class=\"sub-header\" id=\"product_description\">\n",
"<h2>Product Description</h2>\n",
"</div>"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"soup.find(\"div\", {\"id\": \"product_description\"})"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, how to we get to the actual product description?\n",
"\n",
"Looking at the HTML, there is no clear way to capture the text (as it is not wrapped in any class or uniquely identified with a tag).\n",
"\n",
"This can also be seen in the source code of the site:\n",
"\n",
"\n",
"```\n",
"<div id=\"product_description\" class=\"sub-header\">\n",
" <h2>Product Description</h2>\n",
" </div>\n",
"<p>It's hard to imagine a world without A Light in the Attic. [...]</p>\n",
"```\n",
"\n",
"However, what you may notice is that the `<p>` tag directly follows -- at the same hierarchy level -- the product description `<div>` section. In technical terms, these are \"siblings\" (elements at the same hierarchy level).\n",
"\n",
"We can therefore use the `find_next_siblings()` function to find the closest `<p>` tag."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<p>It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon't you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here'sGot it in for you. Shel, you never sounded so good. ...more</p>]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"soup.find(\"div\", {\"id\": \"product_description\"}).find_next_siblings(\"p\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...which gives us the text of the product description."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that next to a sibling, BeautifulSoup also knows about children (so, one element \"lower\" in the hierarchy). I'll discuss this next."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How to capture the links to all books at a category page?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As explained above, it is difficult to capture elements that are not uniquely identifiable with a class name or tag. However, we can use the nested structure of websites to still get at the relevant links.\n",
"\n",
"If you open the category overview page (https://books.toscrape.com/catalogue/category/books_1/index.html) and use your browser's inspect mode, you'd notice all book links are contained in a higher hierarchy class, with the tag section.\n"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"# Let's load in the site first...\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"\n",
"req= requests.get('https://books.toscrape.com/catalogue/category/books_1/index.html')\n",
"# ...and convert it to a BeautifulSoup object\n",
"soup = BeautifulSoup(req.text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So let us first \"select\" everything in that section."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<section>\n",
"<div class=\"alert alert-warning\" role=\"alert\"><strong>Warning!</strong> This is a demo website for web scraping purposes. Prices and ratings here were randomly assigned and have no real meaning.</div>\n",
"<div>\n",
"<ol class=\"row\">\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../a-light-in-the-attic_1000/index.html\"><img alt=\"A Light in the Attic\" class=\"thumbnail\" src=\"../../../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Three\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../a-light-in-the-attic_1000/index.html\" title=\"A Light in the Attic\">A Light in the ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£51.77</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../tipping-the-velvet_999/index.html\"><img alt=\"Tipping the Velvet\" class=\"thumbnail\" src=\"../../../media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating One\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../tipping-the-velvet_999/index.html\" title=\"Tipping the Velvet\">Tipping the Velvet</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£53.74</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../soumission_998/index.html\"><img alt=\"Soumission\" class=\"thumbnail\" src=\"../../../media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating One\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../soumission_998/index.html\" title=\"Soumission\">Soumission</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£50.10</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../sharp-objects_997/index.html\"><img alt=\"Sharp Objects\" class=\"thumbnail\" src=\"../../../media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Four\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../sharp-objects_997/index.html\" title=\"Sharp Objects\">Sharp Objects</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£47.82</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../sapiens-a-brief-history-of-humankind_996/index.html\"><img alt=\"Sapiens: A Brief History of Humankind\" class=\"thumbnail\" src=\"../../../media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Five\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../sapiens-a-brief-history-of-humankind_996/index.html\" title=\"Sapiens: A Brief History of Humankind\">Sapiens: A Brief History ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£54.23</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../the-requiem-red_995/index.html\"><img alt=\"The Requiem Red\" class=\"thumbnail\" src=\"../../../media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating One\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../the-requiem-red_995/index.html\" title=\"The Requiem Red\">The Requiem Red</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£22.65</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\"><img alt=\"The Dirty Little Secrets of Getting Your Dream Job\" class=\"thumbnail\" src=\"../../../media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Four\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\" title=\"The Dirty Little Secrets of Getting Your Dream Job\">The Dirty Little Secrets ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£33.34</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\"><img alt=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\" class=\"thumbnail\" src=\"../../../media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Three\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\" title=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\">The Coming Woman: A ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£17.93</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\"><img alt=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\" class=\"thumbnail\" src=\"../../../media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Four\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\" title=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\">The Boys in the ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£22.60</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../the-black-maria_991/index.html\"><img alt=\"The Black Maria\" class=\"thumbnail\" src=\"../../../media/cache/58/46/5846057e28022268153beff6d352b06c.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating One\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../the-black-maria_991/index.html\" title=\"The Black Maria\">The Black Maria</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£52.15</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\"><img alt=\"Starving Hearts (Triangular Trade Trilogy, #1)\" class=\"thumbnail\" src=\"../../../media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Two\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\" title=\"Starving Hearts (Triangular Trade Trilogy, #1)\">Starving Hearts (Triangular Trade ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£13.99</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../shakespeares-sonnets_989/index.html\"><img alt=\"Shakespeare's Sonnets\" class=\"thumbnail\" src=\"../../../media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Four\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../shakespeares-sonnets_989/index.html\" title=\"Shakespeare's Sonnets\">Shakespeare's Sonnets</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£20.66</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../set-me-free_988/index.html\"><img alt=\"Set Me Free\" class=\"thumbnail\" src=\"../../../media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Five\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../set-me-free_988/index.html\" title=\"Set Me Free\">Set Me Free</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£17.46</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\"><img alt=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\" class=\"thumbnail\" src=\"../../../media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Five\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\" title=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\">Scott Pilgrim's Precious Little ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£52.29</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../rip-it-up-and-start-again_986/index.html\"><img alt=\"Rip it Up and Start Again\" class=\"thumbnail\" src=\"../../../media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Five\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../rip-it-up-and-start-again_986/index.html\" title=\"Rip it Up and Start Again\">Rip it Up and ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£35.02</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\"><img alt=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\" class=\"thumbnail\" src=\"../../../media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Three\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\" title=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\">Our Band Could Be ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£57.25</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../olio_984/index.html\"><img alt=\"Olio\" class=\"thumbnail\" src=\"../../../media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating One\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../olio_984/index.html\" title=\"Olio\">Olio</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£23.88</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\"><img alt=\"Mesaerion: The Best Science Fiction Stories 1800-1849\" class=\"thumbnail\" src=\"../../../media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating One\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\" title=\"Mesaerion: The Best Science Fiction Stories 1800-1849\">Mesaerion: The Best Science ...</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£37.59</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../libertarianism-for-beginners_982/index.html\"><img alt=\"Libertarianism for Beginners\" class=\"thumbnail\" src=\"../../../media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Two\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../libertarianism-for-beginners_982/index.html\" title=\"Libertarianism for Beginners\">Libertarianism for Beginners</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£51.33</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"<li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n",
"<article class=\"product_pod\">\n",
"<div class=\"image_container\">\n",
"<a href=\"../../its-only-the-himalayas_981/index.html\"><img alt=\"It's Only the Himalayas\" class=\"thumbnail\" src=\"../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg\"/></a>\n",
"</div>\n",
"<p class=\"star-rating Two\">\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"<i class=\"icon-star\"></i>\n",
"</p>\n",
"<h3><a href=\"../../its-only-the-himalayas_981/index.html\" title=\"It's Only the Himalayas\">It's Only the Himalayas</a></h3>\n",
"<div class=\"product_price\">\n",
"<p class=\"price_color\">£45.17</p>\n",
"<p class=\"instock availability\">\n",
"<i class=\"icon-ok\"></i>\n",
" \n",
" In stock\n",
" \n",
"</p>\n",
"<form>\n",
"<button class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\" type=\"submit\">Add to basket</button>\n",
"</form>\n",
"</div>\n",
"</article>\n",
"</li>\n",
"</ol>\n",
"<div>\n",
"<ul class=\"pager\">\n",
"<li class=\"current\">\n",
" \n",
" Page 1 of 50\n",
" \n",
" </li>\n",
"<li class=\"next\"><a href=\"page-2.html\">next</a></li>\n",
"</ul>\n",
"</div>\n",
"</div>\n",
"</section>"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"soup.find('section')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"...we can subsequently expand and extract all links"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[<a href=\"../../a-light-in-the-attic_1000/index.html\"><img alt=\"A Light in the Attic\" class=\"thumbnail\" src=\"../../../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg\"/></a>,\n",
" <a href=\"../../a-light-in-the-attic_1000/index.html\" title=\"A Light in the Attic\">A Light in the ...</a>,\n",
" <a href=\"../../tipping-the-velvet_999/index.html\"><img alt=\"Tipping the Velvet\" class=\"thumbnail\" src=\"../../../media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg\"/></a>,\n",
" <a href=\"../../tipping-the-velvet_999/index.html\" title=\"Tipping the Velvet\">Tipping the Velvet</a>,\n",
" <a href=\"../../soumission_998/index.html\"><img alt=\"Soumission\" class=\"thumbnail\" src=\"../../../media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg\"/></a>,\n",
" <a href=\"../../soumission_998/index.html\" title=\"Soumission\">Soumission</a>,\n",
" <a href=\"../../sharp-objects_997/index.html\"><img alt=\"Sharp Objects\" class=\"thumbnail\" src=\"../../../media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg\"/></a>,\n",
" <a href=\"../../sharp-objects_997/index.html\" title=\"Sharp Objects\">Sharp Objects</a>,\n",
" <a href=\"../../sapiens-a-brief-history-of-humankind_996/index.html\"><img alt=\"Sapiens: A Brief History of Humankind\" class=\"thumbnail\" src=\"../../../media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg\"/></a>,\n",
" <a href=\"../../sapiens-a-brief-history-of-humankind_996/index.html\" title=\"Sapiens: A Brief History of Humankind\">Sapiens: A Brief History ...</a>,\n",
" <a href=\"../../the-requiem-red_995/index.html\"><img alt=\"The Requiem Red\" class=\"thumbnail\" src=\"../../../media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg\"/></a>,\n",
" <a href=\"../../the-requiem-red_995/index.html\" title=\"The Requiem Red\">The Requiem Red</a>,\n",
" <a href=\"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\"><img alt=\"The Dirty Little Secrets of Getting Your Dream Job\" class=\"thumbnail\" src=\"../../../media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg\"/></a>,\n",
" <a href=\"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\" title=\"The Dirty Little Secrets of Getting Your Dream Job\">The Dirty Little Secrets ...</a>,\n",
" <a href=\"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\"><img alt=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\" class=\"thumbnail\" src=\"../../../media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg\"/></a>,\n",
" <a href=\"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\" title=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\">The Coming Woman: A ...</a>,\n",
" <a href=\"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\"><img alt=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\" class=\"thumbnail\" src=\"../../../media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg\"/></a>,\n",
" <a href=\"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\" title=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\">The Boys in the ...</a>,\n",
" <a href=\"../../the-black-maria_991/index.html\"><img alt=\"The Black Maria\" class=\"thumbnail\" src=\"../../../media/cache/58/46/5846057e28022268153beff6d352b06c.jpg\"/></a>,\n",
" <a href=\"../../the-black-maria_991/index.html\" title=\"The Black Maria\">The Black Maria</a>,\n",
" <a href=\"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\"><img alt=\"Starving Hearts (Triangular Trade Trilogy, #1)\" class=\"thumbnail\" src=\"../../../media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg\"/></a>,\n",
" <a href=\"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\" title=\"Starving Hearts (Triangular Trade Trilogy, #1)\">Starving Hearts (Triangular Trade ...</a>,\n",
" <a href=\"../../shakespeares-sonnets_989/index.html\"><img alt=\"Shakespeare's Sonnets\" class=\"thumbnail\" src=\"../../../media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg\"/></a>,\n",
" <a href=\"../../shakespeares-sonnets_989/index.html\" title=\"Shakespeare's Sonnets\">Shakespeare's Sonnets</a>,\n",
" <a href=\"../../set-me-free_988/index.html\"><img alt=\"Set Me Free\" class=\"thumbnail\" src=\"../../../media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg\"/></a>,\n",
" <a href=\"../../set-me-free_988/index.html\" title=\"Set Me Free\">Set Me Free</a>,\n",
" <a href=\"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\"><img alt=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\" class=\"thumbnail\" src=\"../../../media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg\"/></a>,\n",
" <a href=\"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\" title=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\">Scott Pilgrim's Precious Little ...</a>,\n",
" <a href=\"../../rip-it-up-and-start-again_986/index.html\"><img alt=\"Rip it Up and Start Again\" class=\"thumbnail\" src=\"../../../media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg\"/></a>,\n",
" <a href=\"../../rip-it-up-and-start-again_986/index.html\" title=\"Rip it Up and Start Again\">Rip it Up and ...</a>,\n",
" <a href=\"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\"><img alt=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\" class=\"thumbnail\" src=\"../../../media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg\"/></a>,\n",
" <a href=\"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\" title=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\">Our Band Could Be ...</a>,\n",
" <a href=\"../../olio_984/index.html\"><img alt=\"Olio\" class=\"thumbnail\" src=\"../../../media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg\"/></a>,\n",
" <a href=\"../../olio_984/index.html\" title=\"Olio\">Olio</a>,\n",
" <a href=\"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\"><img alt=\"Mesaerion: The Best Science Fiction Stories 1800-1849\" class=\"thumbnail\" src=\"../../../media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg\"/></a>,\n",
" <a href=\"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\" title=\"Mesaerion: The Best Science Fiction Stories 1800-1849\">Mesaerion: The Best Science ...</a>,\n",
" <a href=\"../../libertarianism-for-beginners_982/index.html\"><img alt=\"Libertarianism for Beginners\" class=\"thumbnail\" src=\"../../../media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg\"/></a>,\n",
" <a href=\"../../libertarianism-for-beginners_982/index.html\" title=\"Libertarianism for Beginners\">Libertarianism for Beginners</a>,\n",
" <a href=\"../../its-only-the-himalayas_981/index.html\"><img alt=\"It's Only the Himalayas\" class=\"thumbnail\" src=\"../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg\"/></a>,\n",
" <a href=\"../../its-only-the-himalayas_981/index.html\" title=\"It's Only the Himalayas\">It's Only the Himalayas</a>,\n",
" <a href=\"page-2.html\">next</a>]"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"soup.find('section').find_all('a')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After eyeballing whether this is correct or not, we can build our loop:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"../../a-light-in-the-attic_1000/index.html\n",
"../../a-light-in-the-attic_1000/index.html\n",
"../../tipping-the-velvet_999/index.html\n",
"../../tipping-the-velvet_999/index.html\n",
"../../soumission_998/index.html\n",
"../../soumission_998/index.html\n",
"../../sharp-objects_997/index.html\n",
"../../sharp-objects_997/index.html\n",
"../../sapiens-a-brief-history-of-humankind_996/index.html\n",
"../../sapiens-a-brief-history-of-humankind_996/index.html\n",
"../../the-requiem-red_995/index.html\n",
"../../the-requiem-red_995/index.html\n",
"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\n",
"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\n",
"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\n",
"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\n",
"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\n",
"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\n",
"../../the-black-maria_991/index.html\n",
"../../the-black-maria_991/index.html\n",
"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\n",
"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\n",
"../../shakespeares-sonnets_989/index.html\n",
"../../shakespeares-sonnets_989/index.html\n",
"../../set-me-free_988/index.html\n",
"../../set-me-free_988/index.html\n",
"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\n",
"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\n",
"../../rip-it-up-and-start-again_986/index.html\n",
"../../rip-it-up-and-start-again_986/index.html\n",
"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\n",
"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\n",
"../../olio_984/index.html\n",
"../../olio_984/index.html\n",
"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\n",
"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\n",
"../../libertarianism-for-beginners_982/index.html\n",
"../../libertarianism-for-beginners_982/index.html\n",
"../../its-only-the-himalayas_981/index.html\n",
"../../its-only-the-himalayas_981/index.html\n",
"page-2.html\n"
]
}
],
"source": [
"for link in soup.find('section').find_all('a'):\n",
" print(link['href'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Observe this still contains duplicates (arising from the fact that a book has always two links).\n",
"\n",
"Further inspection yields that each book has a class called `image_container`. We can first zoom in on that class, and then search for the link. This will give a list of unique links."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"../../a-light-in-the-attic_1000/index.html\n",
"../../tipping-the-velvet_999/index.html\n",
"../../soumission_998/index.html\n",
"../../sharp-objects_997/index.html\n",
"../../sapiens-a-brief-history-of-humankind_996/index.html\n",
"../../the-requiem-red_995/index.html\n",
"../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\n",
"../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\n",
"../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\n",
"../../the-black-maria_991/index.html\n",
"../../starving-hearts-triangular-trade-trilogy-1_990/index.html\n",
"../../shakespeares-sonnets_989/index.html\n",
"../../set-me-free_988/index.html\n",
"../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\n",
"../../rip-it-up-and-start-again_986/index.html\n",
"../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\n",
"../../olio_984/index.html\n",
"../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\n",
"../../libertarianism-for-beginners_982/index.html\n",
"../../its-only-the-himalayas_981/index.html\n"
]
}
],
"source": [
"for container in soup.find('section').find_all(class_='image_container'):\n",
" print(container.find('a')['href'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, we can store that result in a list."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['../../a-light-in-the-attic_1000/index.html',\n",
" '../../tipping-the-velvet_999/index.html',\n",
" '../../soumission_998/index.html',\n",
" '../../sharp-objects_997/index.html',\n",
" '../../sapiens-a-brief-history-of-humankind_996/index.html',\n",
" '../../the-requiem-red_995/index.html',\n",
" '../../the-dirty-little-secrets-of-getting-your-dream-job_994/index.html',\n",
" '../../the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html',\n",
" '../../the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html',\n",
" '../../the-black-maria_991/index.html',\n",
" '../../starving-hearts-triangular-trade-trilogy-1_990/index.html',\n",
" '../../shakespeares-sonnets_989/index.html',\n",
" '../../set-me-free_988/index.html',\n",
" '../../scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html',\n",
" '../../rip-it-up-and-start-again_986/index.html',\n",
" '../../our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html',\n",
" '../../olio_984/index.html',\n",
" '../../mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html',\n",
" '../../libertarianism-for-beginners_982/index.html',\n",
" '../../its-only-the-himalayas_981/index.html']"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"links = []\n",
"for container in soup.find('section').find_all(class_='image_container'):\n",
" links.append(container.find('a')['href'])\n",
"\n",
"links"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment