Skip to content

Instantly share code, notes, and snippets.

@kcarnold
Created February 1, 2013 19:52
Show Gist options
  • Save kcarnold/4693641 to your computer and use it in GitHub Desktop.
Save kcarnold/4693641 to your computer and use it in GitHub Desktop.
Download LaTeX source code from Google Drive, strip it to plain text without comments, and compile. Just set the Google Doc sharing mode to "Anyone with the link", and paste the unique part of that link into the appropriate place in gen.sh
import lxml.html
import sys
doc = lxml.html.fromstring(sys.stdin.read())
for elt in doc.cssselect('a, div, style, title'):
elt.getparent().remove(elt)
s = u'\n'.join((elt.text_content() for elt in doc.cssselect('p, h1, h2, h3, h4, h5, h6'))) # add other nodes if I forgot any
tr = [
(u'\u2018', "`"),
(u'\u2019', "'"),
(u'\u201c', "``"),
(u'\u201d', "''"),
(u'\xa0', ' '), # no-break space
]
for a, b in tr:
s = s.replace(a, b)
sys.stdout.write(s.encode('latin1'))
#!/bin/bash
set -e
function get() {
curl "https://docs.google.com/document/d/$2/export?format=html" | python dehtml.py > "$1".tex
}
NAME=paper
get $NAME THE_LINK_CODE
#get intro ANOTHER_CODE
#get results ANOTHER_CODE
get ending ANOTHER_CODE
pdflatex $NAME
bibtex $NAME
pdflatex $NAME
pdflatex $NAME
@kcarnold
Copy link
Author

kcarnold commented Feb 5, 2013

It would be quite easy to do this with a single Python script also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment