Skip to content

Instantly share code, notes, and snippets.

import numpy as np
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
def get_vectors(vocab_size=5000):
newsgroups_train = fetch_20newsgroups(subset='train')
vectorizer = CountVectorizer(max_df=.9, max_features=vocab_size)
vecs = vectorizer.fit_transform(newsgroups_train.data)
vocabulary = vectorizer.vocabulary
terms = np.array(vocabulary.keys())
#!/usr/bin/python
# to use:
# 1) git clone git@github.com:jszakmeister/nose.git nose-711
# 2) cd nose-711
# 3) python setup.py build
# 4) cd build/lib
# 5) put this file there as test.py
# 6) PYTHONPATH=$PWD python -m nose ./test.py
# # observe the error:
@mdengler
mdengler / test.py
Created May 14, 2014 23:23
test case for nose #711
#!/usr/bin/python
# to use:
# 1) git clone git@github.com:jszakmeister/nose.git nose-711
# 2) cd nose-711
# 3) python setup.py build
# 4) cd build/lib
# 5) put this file there as test.py
# 6) PYTHONPATH=$PWD python -m nose ./test.py
# # observe the error:
@mdengler
mdengler / README.md
Last active August 29, 2015 14:02 — forked from mbostock/.block
Wilson's algorithm by mbostock, with quicker-completion starting conditions

Wilson’s algorithm uses loop-erased random walks to generate a uniform spanning tree — an unbiased sample of all possible spanning trees. Most other maze generation algorithms, such as Prim’s, random traversal and randomized depth-first traversal, do not have this beautiful property.

The algorithm initializes the maze with eight arbitrary starting cells. Then, a new cell is added to the maze, initiating a random walk (shown in magenta). The random walk continues until it reconnects with the existing maze (shown in white). However, if the random walk intersects itself, the resulting loop is erased before the random walk continues.

The global structure of the maze can be more easily seen by flooding it with color.

To play with this yourself, [instructions from Uehreka on http://news.yco

### Keybase proof
I hereby claim:
* I am mdengler on github.
* I am mdengler (https://keybase.io/mdengler) on keybase.
* I have a public key whose fingerprint is 565C 8F33 ABF7 2DAA 33DE B9CB 81B4 44DD 75C7 D2F8
To claim this, I am signing this object:
@mdengler
mdengler / garch.py
Created April 18, 2012 04:38
garch in python, from Peter von Tessin
#!/usr/bin/env python
# Trivial GARCH implementation in python
#
# From Peter Tessin's http://www.petertessin.com/TimeSeries.pdf
#
#!/usr/bin/env python
# See https://github.com/Elleo/gst-opencv/blob/master/examples/python/facedetect.py
# See also http://blog.mikeasoft.com/2010/06/17/gstreamer-opencv-plugins-on-the-nokia-n900/
import pygst
pygst.require("0.10")
import gst
import gtk
@mdengler
mdengler / csvvert.py
Created June 11, 2012 03:12
CSV -> vertical display
#!/usr/bin/env python
"""
Displays a CSV file's contents vertically.
Example:
$ cat | ~/bin/csvvert.py
Year,Make,Model,Length
1981,Ford,Capri Ghia,2.34
@mdengler
mdengler / html2csv.py
Created June 21, 2012 04:45
html2csv.py
#!/bin/env python
# -*- coding: utf-8 -*-
"""
Examples:
%(progname)s http://en.wikipedia.org/wiki/List_of_Olympic_records_in_athletics
This is essentially this logic, done up
#!/usr/bin/env python
"""
Proof of concept scraper for pinnacle sports
"""
FEED = "http://xml.pinnaclesports.com/pinnacleFeed.aspx"