public
Last active

Builds epub book out of Paul Graham's essays.

  • Download Gist
pgessays.py
Python
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
# -*- coding: utf-8 -*-
"""
Builds epub book out of Paul Graham's essays: http://paulgraham.com/articles.html
 
Author: Ola Sitarska <ola@sitarska.com>
Copyright: Licensed under the GPL-3 (http://www.gnu.org/licenses/gpl-3.0.html)
 
This script requires python-epub-library: http://code.google.com/p/python-epub-builder/
"""
 
import re, ez_epub, urllib2, genshi
from BeautifulSoup import BeautifulSoup
 
def addSection(link, title):
if not 'http' in link:
page = urllib2.urlopen('http://www.paulgraham.com/'+link).read()
soup = BeautifulSoup(page)
soup.prettify()
else:
page = urllib2.urlopen(link).read()
section = ez_epub.Section()
try:
section.title = title
print section.title
 
if not 'http' in link:
font = str(soup.findAll('table', {'width':'455'})[0].findAll('font')[0])
if not 'Get funded by' in font and not 'Watch how this essay was' in font and not 'Like to build things?' in font and not len(font)<100:
content = font
else:
content = ''
for par in soup.findAll('table', {'width':'455'})[0].findAll('p'):
content += str(par)
 
for p in content.split("<br /><br />"):
section.text.append(genshi.core.Markup(p))
 
#exception for Subject: Airbnb
for pre in soup.findAll('pre'):
section.text.append(genshi.core.Markup(pre))
else:
for p in str(page).replace("\n","<br />").split("<br /><br />"):
section.text.append(genshi.core.Markup(p))
except:
pass
return section
 
 
book = ez_epub.Book()
book.title = "Paul Graham's Essays"
book.authors = ['Paul Graham']
 
page = urllib2.urlopen('http://www.paulgraham.com/articles.html').read()
soup = BeautifulSoup(page)
soup.prettify()
 
links = soup.findAll('table', {'width': '455'})[1].findAll('a')
sections = []
for link in links:
sections.append(addSection(link['href'], link.text))
book.sections = sections
book.make(book.title)

I'm getting an error about an invalid java call, I suppose the "subprocess.call(['java', '-jar', checkerPath, epubPath], shell = True)" in epub.py. I have java installed. Details: java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (6b24-1.11.5-0ubuntu1~10.04.2)
OpenJDK Client VM (build 20.0-b12, mixed mode, sharing)

Any ideas?

On iPhone 5, I get "This page contains the following errors: error on line 13 at column 7: Opening and ending tag mismatch: font line 0 and p" this error when I open the generated epub file in iBooks

What deps does this have?

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.