Skip to content

Instantly share code, notes, and snippets.

View paultopia's full-sized avatar

Paul Gowder paultopia

View GitHub Profile
@paultopia
paultopia / addcss.py
Created October 8, 2015 22:00
Easy way to add CSS or whatever to header of a bunch of HTML files at once.
# USAGE:
#
# To add code to the end of every <head> tag (like a css link, a font link, etc.) to quick-format an entire website:
# 1. Start in the top-level-directory of the site. Put this file there.
# 2. Add your formtting for the <head> tag to the formatme variable.
# EXAMPLE: mine was '<link href="https://fonts.googleapis.com/css?family=Halant:300" rel="stylesheet" type="text/css"><link rel="stylesheet" href="http://paul-gowder.com/conlawII/prettify.css">'
# be sure to either escape quotes or use single quotes to demarcate the string and double-quotes in the html/vice versa
# 3. Run this script.
# 4. Bam. Every html page in in the top-level directory and all its subdirectories now has the formatting you want.
@paultopia
paultopia / wordcount.py
Last active November 18, 2015 20:52
quick and dirty script to take a bunch of documents and output a printed list of word counts for each document as well as for whole
# assumes documents are provided in the form of a list of (docid, doctext) tuples named thedocslist. docid = int/string/float; doctext = string
import nltk
import string
from collections import Counter
# get rid of punctuation, numbers; make all lowercase. no stemming.
counterslist = []
for onedocument in thedocslist:
@paultopia
paultopia / toolchain.md
Last active November 30, 2015 20:16
The toolchain of a techy-ish political theorist/lawyer

My toolchain (on OSX)

I do all four of the following often:

  1. Write lengthy academic prose.

  2. Write code.

  3. Analyze data and do other math-y things.

@paultopia
paultopia / makeslide.py
Last active May 3, 2017 20:41 — forked from aaronwolen/slides.md
Pandoc template to generate reveal.js slideshows. (corrected from original to be compatible with 3.2.0 release of reveal.js); also added a quick python script to generate with less command-line ugliness.
import argparse
# this first bit is to enable multiline help text. apparently this is a known problem with argparse.
# Solution jacked from http://stackoverflow.com/questions/3853722/python-argparse-how-to-insert-newline-in-the-help-text
import textwrap as _textwrap
class MultilineFormatter(argparse.HelpFormatter):
def _fill_text(self, text, width, indent):
text = self._whitespace_matcher.sub(' ', text).strip()
paragraphs = text.split('|n ')
@paultopia
paultopia / pgmd.py
Created December 1, 2015 05:42
Quick and easy command-line wrapper for pandoc conversions to html, pdf, and docx
# The point of this script is that pandoc commandline syntax is painful and hard to remember.
# I really only produce html, pdf, and docx. And I only ever use the defaults. Ergo, a script
# (subsequently to be put in $PATH with path to python added to top to be runnable trivially) to
# make it simple.
#
# usage: python pgmd.py INPUTFILE FORMAT[html/pdf/word]
# that's it. easy.
#
# there are a handful of other options (output file, overwrite output file, append scripts and
# css and such to html headers), details are in the commandline help via -h flag
# OBSOLETE.
# GO HERE INSTEAD: https://github.com/paultopia/spideyscrape
# very basic scraper-spider for those html books where there's a table of contents page that links to a
# bunch of sub-pages with actual content. (Like the documentation for a bunch of libraries.)
# WARNING: has no validation, assumes pages contain relative links and are all on the same site.
# (this is an easy tweak but I don't have time today)
# also assumes all content is vanilla html or at least can be accessed through vanilla html.
#
# pass ToC page through raw_input. This script scrapes every unique page linked from ToC and
# EDIT: this has now been upgraded to a full-fledged repo and is accepting PRs. This gist is no longer updating.
# go here: https://github.com/paultopia/spideyscrape
# This is a very basic scraper-spider for those html books where there's a table of contents page that links to a
# bunch of sub-pages with actual content (like the documentation for a bunch of libraries).
#
# Dependencies: Beautiful soup 4 on Python 2.7.
#
# It assumes all content is vanilla html or at least can be accessed through vanilla html.
#
@paultopia
paultopia / scrapewrap.py
Last active December 29, 2015 17:39
scrapewrap.py
import sys
import spideyscrape
import console
import os
args = sys.argv[1:] # see if the user gave us a command line argument
start = args[0] if args else raw_input('URL to crawl: ')
html = spideyscrape.scrape(start)
filename = spideyscrape.savePage(html)
console.open_in(filename)
@paultopia
paultopia / worst_python_ever.py
Last active December 30, 2015 03:44
worst_python_ever.py
# I think I've discovered a bit of Python code even more dangerous than https://github.com/ajalt/fuckitpy
class string(str):
def __call__(self):
try:
exec self
except Exception:
pass
evil = string('print "EVIL"')
@paultopia
paultopia / never_do_this.py
Last active June 3, 2016 22:10
never_do_this.py
# NEVER DO THIS EXCEPT AS A PRANK ON YOUR WORST ENEMY
class foo(str):
def __call__(self):
try:
exec self
except Exception:
pass
str = foo