Skip to content

Instantly share code, notes, and snippets.

View kcarnold's full-sized avatar

Kenneth C. Arnold kcarnold

  • Grand Rapids, MI
View GitHub Profile
import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
def get_vectors(vocab_size=5000):
newsgroups_train = fetch_20newsgroups(subset='train')
vectorizer = CountVectorizer(max_df=.9, max_features=vocab_size)
vecs = vectorizer.fit_transform(newsgroups_train.data)
vocabulary = vectorizer.vocabulary
terms = np.array(vocabulary.keys())
@kcarnold
kcarnold / README
Created November 3, 2011 19:46
Random restore-to-inbox
Google Apps Script is clunky. You have to make a spreadsheet in Google Docs, then Tools->Script Editor. Then you can paste this code in a new project. Try running it once to make sure it works, then under Triggers select moveRandomSnoozes, time-based, and whatever frequency you want. (I had it going daily, but just changed the probabilities and set it to hourly; we'll see how I like that.)
This was partly based on the Gmail Snooze script that was making the rounds a few months ago.
@kcarnold
kcarnold / dpgmm.py
Created November 17, 2011 16:25
DPGMM sampler
# Dirichlet process Gaussian mixture model
import numpy as np
from scipy.special import gammaln
from scipy.linalg import cholesky
from sliceSample import sliceSample
def multinomialDraw(dist):
"""Returns a single draw from the given multinomial distribution."""
return np.random.multinomial(1, dist).argmax()
@kcarnold
kcarnold / gp.m
Created November 29, 2011 20:28
Gaussian Processes demo
% All based on Rasmussen and Williams.
N = 1000;
lo=0; hi=5;
x = linspace(lo, hi, N);
%xn = [2, 2.5, 3]';
%yn = [-1.9, -2, -1.9]';
xn = ([-4, -3, -1, 0, 2]'+5)/2;
yn = [-2, 0, 1, 2, -1]';
@kcarnold
kcarnold / event.js
Created July 16, 2012 21:44
Trivial JavaScript event object system
var Event = (function() {
function Event(name) {
this.name = name;
this.listeners = [];
}
Event.prototype.when = function(context, callback) {
if (arguments.length === 1) {
callback = context;
context = null;
}
@kcarnold
kcarnold / browserStats.js
Last active February 14, 2016 22:13
Logging
function logBrowserStats() {
function getAsObject(objectLike) {
var obj = {}, key;
for (key in objectLike) {
var val = objectLike[key];
if (typeof val === 'number' || typeof val === 'string') {
obj[key] = val;
}
}
return obj;
import sys
tr = [
(u'\u2018', "`"),
(u'\u2019',"'"),
(u'\u201c', "``"),
(u'\u201d', "''")
]
s = sys.stdin.read().decode('utf8')
for a, b in tr:
s = s.replace(a, b)
@kcarnold
kcarnold / dehtml.py
Created February 1, 2013 19:52
Download LaTeX source code from Google Drive, strip it to plain text without comments, and compile. Just set the Google Doc sharing mode to "Anyone with the link", and paste the unique part of that link into the appropriate place in gen.sh
import lxml.html
import sys
doc = lxml.html.fromstring(sys.stdin.read())
for elt in doc.cssselect('a, div, style, title'):
elt.getparent().remove(elt)
s = u'\n'.join((elt.text_content() for elt in doc.cssselect('p, h1, h2, h3, h4, h5, h6'))) # add other nodes if I forgot any
@kcarnold
kcarnold / .bash_profile
Last active December 15, 2015 12:29
Dreamhost + git bashrc
# ~/.bash_profile: executed by bash(1) for login shells.
source .bashrc
@kcarnold
kcarnold / .bashrc
Created March 29, 2013 17:06
Shell history is valuable. Disk space is cheap. Store more history than bash's default.
# don't put duplicate lines in the history. See bash(1) for more options
HISTCONTROL=ignoredups:ignorespace
HISTIGNORE=ls:fg
# append to the history file, don't overwrite it
shopt -s histappend
# for setting history length, see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=10000
HISTFILESIZE=20000