Skip to content

Instantly share code, notes, and snippets.

@letroot
Created February 20, 2018 13:06
Show Gist options
  • Save letroot/f525d9082e5454f6d0a0e0479e1cc3ba to your computer and use it in GitHub Desktop.
Save letroot/f525d9082e5454f6d0a0e0479e1cc3ba to your computer and use it in GitHub Desktop.
Batch download composingprogram.com pages as pdf
import pdfkit
import requests
from bs4 import BeautifulSoup
html = requests.get("http://www.composingprograms.com/")
soup = BeautifulSoup(html.text, 'html.parser')
res = []
for link in soup.find_all('a'):
linkstr = link.get('href')
if './pages/' in linkstr:
linkstr = linkstr.replace('./pages', 'http://www.composingprograms.com/pages')
res.append(linkstr)
def urltitle(url):
html = requests.get(url).text
title = BeautifulSoup(html, 'html.parser').title.string
return title
urltitle_dict = {k: urltitle(k) for k in res}
pdfkit.from_url(res[0], urltitle_dict[res[0]]+'.pdf')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment