Skip to content

Instantly share code, notes, and snippets.

Created November 18, 2012 10:11
Show Gist options
  • Save olasitarska/4104455 to your computer and use it in GitHub Desktop.
Save olasitarska/4104455 to your computer and use it in GitHub Desktop.
Builds epub book out of Paul Graham's essays.
# -*- coding: utf-8 -*-
Builds epub book out of Paul Graham's essays:
Author: Ola Sitarska <>
Copyright: Licensed under the GPL-3 (
This script requires python-epub-library:
import re, ez_epub, urllib2, genshi
from BeautifulSoup import BeautifulSoup
def addSection(link, title):
if not 'http' in link:
page = urllib2.urlopen(''+link).read()
soup = BeautifulSoup(page)
page = urllib2.urlopen(link).read()
section = ez_epub.Section()
section.title = title
print section.title
if not 'http' in link:
font = str(soup.findAll('table', {'width':'455'})[0].findAll('font')[0])
if not 'Get funded by' in font and not 'Watch how this essay was' in font and not 'Like to build things?' in font and not len(font)<100:
content = font
content = ''
for par in soup.findAll('table', {'width':'455'})[0].findAll('p'):
content += str(par)
for p in content.split("<br /><br />"):
#exception for Subject: Airbnb
for pre in soup.findAll('pre'):
for p in str(page).replace("\n","<br />").split("<br /><br />"):
return section
book = ez_epub.Book()
book.title = "Paul Graham's Essays"
book.authors = ['Paul Graham']
page = urllib2.urlopen('').read()
soup = BeautifulSoup(page)
links = soup.findAll('table', {'width': '455'})[1].findAll('a')
sections = []
for link in links:
sections.append(addSection(link['href'], link.text))
book.sections = sections
Copy link

In order to get valid HTML (which is what .epub contains) you also need to remove the <font> tags (beyond just changing the table width to 435 as @gsdatta and @SergeAx said).

As of today April 12th, 2017 only 4 forks of this gist actually changed the code:

Forks that fix the table width and removes <font> tags:

Forks that fix the table width (but doesn't remove <font> tags):

Very interesting fork – very large script that does a lot of stuff and is basically a rewrite:

Copy link

Added a fork with the

  • new modern libs urllib3 and bs4
  • width fix
  • minor syntax changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment