Created

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

Builds epub book out of Paul Graham's essays.

View pgessays.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
# -*- coding: utf-8 -*-
"""
Builds epub book out of Paul Graham's essays: http://paulgraham.com/articles.html
Author: Ola Sitarska <ola@sitarska.com>
Copyright: Licensed under the GPL-3 (http://www.gnu.org/licenses/gpl-3.0.html)
This script requires python-epub-library: http://code.google.com/p/python-epub-builder/
"""
 
import re, ez_epub, urllib2, genshi
from BeautifulSoup import BeautifulSoup
 
def addSection(link, title):
if not 'http' in link:
page = urllib2.urlopen('http://www.paulgraham.com/'+link).read()
soup = BeautifulSoup(page)
soup.prettify()
else:
page = urllib2.urlopen(link).read()
section = ez_epub.Section()
try:
section.title = title
print section.title
 
if not 'http' in link:
font = str(soup.findAll('table', {'width':'455'})[0].findAll('font')[0])
if not 'Get funded by' in font and not 'Watch how this essay was' in font and not 'Like to build things?' in font and not len(font)<100:
content = font
else:
content = ''
for par in soup.findAll('table', {'width':'455'})[0].findAll('p'):
content += str(par)
 
for p in content.split("<br /><br />"):
section.text.append(genshi.core.Markup(p))
 
#exception for Subject: Airbnb
for pre in soup.findAll('pre'):
section.text.append(genshi.core.Markup(pre))
else:
for p in str(page).replace("\n","<br />").split("<br /><br />"):
section.text.append(genshi.core.Markup(p))
except:
pass
return section
 
 
book = ez_epub.Book()
book.title = "Paul Graham's Essays"
book.authors = ['Paul Graham']
 
page = urllib2.urlopen('http://www.paulgraham.com/articles.html').read()
soup = BeautifulSoup(page)
soup.prettify()
 
links = soup.findAll('table', {'width': '455'})[1].findAll('a')
sections = []
for link in links:
sections.append(addSection(link['href'], link.text))
book.sections = sections
book.make(book.title)

I'm getting an error about an invalid java call, I suppose the "subprocess.call(['java', '-jar', checkerPath, epubPath], shell = True)" in epub.py. I have java installed. Details: java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~10.04.2)
OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)

Any ideas?

On iPhone 5, I get "This page contains the following errors: error on line 13 at column 7: Opening and ending tag mismatch: font line 0 and p" this error when I open the generated epub file in iBooks

What deps does this have?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.