Skip to content

Instantly share code, notes, and snippets.

@philshem
Last active September 23, 2022 17:38
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save philshem/10099302 to your computer and use it in GitHub Desktop.
Scrape the number of pages in a book from Amazon.com
# Add links to urllist for more pages.
# Code can be expanded to scrape more.
import requests
from bs4 import BeautifulSoup
urllist = [
'http://www.amazon.com/Flash-Boys-Wall-Street-Revolt/dp/0393244660',
'http://www.amazon.com/The-Big-Short-Doomsday-Machine/dp/0393338827'
]
for url in urllist:
r = requests.get(url)
soup = BeautifulSoup(r.text)
tmp = ''
for line in soup.get_text().split():
if line.lower() == 'pages' and tmp.isdigit():
print tmp,line, ' - ',soup.html.head.title.text
else:
tmp = line
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment