Skip to content

Instantly share code, notes, and snippets.

@tianhuil
tianhuil / Timer.py
Last active December 30, 2015 12:29
Python's timer module only takes statements as strings. Here's a timer class that you can use with a "with" statement to time even0 multi-line statements.
import datetime
class Timer:
def __init__(self, string=None):
self.string = string
def __enter__(self):
self.t1 = datetime.datetime.now()
def __exit__(self, exc_type, exc_value, traceback):
@tianhuil
tianhuil / GroupedHorizontalBarGraph.html
Last active July 26, 2016 21:19
Grouped Horizontal Bar Graph in d3. This is simply reversing the x and y axes of http://bl.ocks.org/mbostock/3887051 This version has abstracted the data and graphics layers so that it is easy to reuse.
<!DOCTYPE html>
<meta charset="utf-8">
<style>
body {
font: 10px sans-serif;
}
.axis path,
.axis line {
@tianhuil
tianhuil / GroupVerticalBarGraph.html
Created December 18, 2013 19:41
Based on http://bl.ocks.org/mbostock/3887051 but more modular so that it's more reusable.
<!DOCTYPE html>
<meta charset="utf-8">
<style>
body {
font: 10px sans-serif;
}
.axis path,
.axis line {
@tianhuil
tianhuil / DecisionTreeVersusLinearRegression.py
Created January 1, 2014 16:50
Decision tree versus linear regression comparison plot
import numpy as np
# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y += .2 * (0.5 - rng.rand(80))
# Fit regression model
from sklearn.tree import DecisionTreeRegressor
@tianhuil
tianhuil / LearningCurve.py
Created January 1, 2014 21:06
Learning Curves for trees of different depths
import numpy as np
from collections import namedtuple
# Create a random dataset
rng = np.random.RandomState(42)
N_points = 100
X = np.sort(5 * rng.rand(N_points, 1), axis=0)
y = np.sin(X).ravel()
y += .4 * (0.5 - rng.rand(N_points))
@tianhuil
tianhuil / VarianceBiasPlot.py
Last active January 1, 2016 22:39
Variance-Bias Plots
import numpy as np
from collections import namedtuple
# Create a random dataset
rng = np.random.RandomState(42)
N_points = 10000
X = np.sort(5 * rng.rand(N_points, 1), axis=0)
y = np.sin(X).ravel()
y += .4 * (0.5 - rng.rand(N_points))
@tianhuil
tianhuil / col2tsv.py
Created January 9, 2014 04:28
col2tsv: converts a single column by breaking it up by the first column
#!/usr/bin/env python
import sys
n_rows = int(sys.argv[1])
data = map(lambda x: [], xrange(n_rows))
for k, line in enumerate(sys.stdin):
data[k % n_rows] += [line.rstrip()]
@tianhuil
tianhuil / setup_digitalocean.sh
Last active March 16, 2021 10:04
Setup Digital Ocean Ubuntu Droplet with scientific python and mysql
# To run this command:
# curl https://gist.githubusercontent.com/tianhuil/0aa9b265f55413dc7198/raw > setup_digitalocean.sh
# . setup_digitalocean.sh
# Update sudo apt-get
sudo apt-get update
# Installing scientific Python
sudo apt-get -y install --fix-missing build-essential python-dev python-numpy python-setuptools python-scipy libatlas-dev
sudo apt-get -y install --fix-missing build-essential python-sklearn
@tianhuil
tianhuil / setup_venv.sh
Created May 9, 2014 20:17
Setting up virtual environments
# if you need to install virtualenv
# pip install virtualenv
# in the folder
virtualenv venv
. venv/bin/activate
# to deactivate
# deactivate
@tianhuil
tianhuil / install_hadoop.sh
Last active August 29, 2015 14:01
Installing Hadoop on Ubuntu
# Instaling Ubuntu
# http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_2.html
# While, we're at it, let's install the JDK ...
echo "y" | sudo apt-get install openjdk-6-jdk
# ... and Yelp's mrjob
pip install mrjob
# Install maven