Skip to content

Instantly share code, notes, and snippets.

@JKirchartz
Created January 31, 2017 21:33
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save JKirchartz/80ad6ec90d44b58486db89058d2fdb37 to your computer and use it in GitHub Desktop.
Save JKirchartz/80ad6ec90d44b58486db89058d2fdb37 to your computer and use it in GitHub Desktop.
Download all quotes from GoodReads by author's quote URL, print in fortune format
#! /usr/bin/env python
# -*- coding: utf-8 -*-
# vim:fenc=utf-8
#
# Copyleft (ↄ) 2016 jkirchartz <me@jkirchartz.com>
#
# Distributed under terms of the NPL (Necessary Public License) license.
"""
Download all quotes from GoodReads by author's quote URL, print in fortune format
usage:
python goodreadsquotes.py https://www.goodreads.com/author/quotes/1791.Seth_Godin > godin
"""
from pyquery import PyQuery
import sys, random, re, time
AUTHOR_REX = re.compile('\d+\.(\w+)$')
def grabber(base_url, i=1):
url = base_url + "?page=" + str(i)
page = PyQuery(url)
quotes = page(".quoteText")
auth_match = re.search(AUTHOR_REX, base_url)
if auth_match:
author = re.sub('_', ' ', auth_match.group(1))
else:
author = False
# sys.stderr.write(url + "\n")
for quote in quotes.items():
quote = quote.remove('script').text().encode('ascii', 'ignore')
if author:
quote = quote.replace(author, " -- " + author)
print quote
print '%'
if not page('.next_page').hasClass('disabled'):
time.sleep(10)
grabber(base_url, i + 1)
if __name__ == "__main__":
grabber(''.join(sys.argv[1:]))
@MarioVilas
Copy link

Doesn't seem to be working for me, it just gets stuck without ever retrieving a quote.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment