Skip to content

Instantly share code, notes, and snippets.

Created November 18, 2012 10:11
Show Gist options
  • Save olasitarska/4104455 to your computer and use it in GitHub Desktop.
Save olasitarska/4104455 to your computer and use it in GitHub Desktop.
Builds epub book out of Paul Graham's essays.
# -*- coding: utf-8 -*-
Builds epub book out of Paul Graham's essays:
Author: Ola Sitarska <>
Copyright: Licensed under the GPL-3 (
This script requires python-epub-library:
import re, ez_epub, urllib2, genshi
from BeautifulSoup import BeautifulSoup
def addSection(link, title):
if not 'http' in link:
page = urllib2.urlopen(''+link).read()
soup = BeautifulSoup(page)
page = urllib2.urlopen(link).read()
section = ez_epub.Section()
section.title = title
print section.title
if not 'http' in link:
font = str(soup.findAll('table', {'width':'455'})[0].findAll('font')[0])
if not 'Get funded by' in font and not 'Watch how this essay was' in font and not 'Like to build things?' in font and not len(font)<100:
content = font
content = ''
for par in soup.findAll('table', {'width':'455'})[0].findAll('p'):
content += str(par)
for p in content.split("<br /><br />"):
#exception for Subject: Airbnb
for pre in soup.findAll('pre'):
for p in str(page).replace("\n","<br />").split("<br /><br />"):
return section
book = ez_epub.Book()
book.title = "Paul Graham's Essays"
book.authors = ['Paul Graham']
page = urllib2.urlopen('').read()
soup = BeautifulSoup(page)
links = soup.findAll('table', {'width': '455'})[1].findAll('a')
sections = []
for link in links:
sections.append(addSection(link['href'], link.text))
book.sections = sections
Copy link

gsdatta commented Aug 27, 2015

Quick fix - it should be width 435 now.

Copy link

SergeAx commented May 30, 2016

One should change '455' to '435' at lines 28, 33 and 59 for this code to work.

Copy link

In order to get valid HTML (which is what .epub contains) you also need to remove the <font> tags (beyond just changing the table width to 435 as @gsdatta and @SergeAx said).

As of today April 12th, 2017 only 4 forks of this gist actually changed the code:

Forks that fix the table width and removes <font> tags:

Forks that fix the table width (but doesn't remove <font> tags):

Very interesting fork – very large script that does a lot of stuff and is basically a rewrite:

Copy link

Added a fork with the

  • new modern libs urllib3 and bs4
  • width fix
  • minor syntax changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment