Skip to content

Instantly share code, notes, and snippets.

Michael Li tianhuil

Block or report user

Report or block tianhuil

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
tianhuil /
Last active Nov 12, 2019
Data Sharding (useful preprocessing for dask)
import gzip
import os
from itertools import islice
import argparse
# from
def grouper(iterable, n):
iterator = iter(iterable)
while True:
group = tuple(islice(iterator, n))
tianhuil /
Created Nov 9, 2019
Turns classifier's `predict_proba` into a transform
from sklearn.base import BaseEstimator, TransformerMixin
class ClassifierTransform(BaseEstimator, TransformerMixin):
def __init__(self, clf):
self.clf = clf
def fit(self, X, y=None):, y)
return self
tianhuil / Residua
Last active Nov 8, 2019
This is a Residual Regressor and Residual Classifier
View Residua
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin
class _ResidualEstimator(BaseEstimator):
def __init__(self, base, residual):
self.base = base
self.residual = residual
def fit(self, X, y):, y), y - self.base.predict(X))
tianhuil / update_branches.ipy
Last active Dec 3, 2017
Finds branches that can be safely merged
View update_branches.ipy
# to use this run these two steps
# curl
# ipython merged_branches.ipy
!git checkout master
branches = !git branch
other_branches = [b.lstrip() for b in branches if b != '* master']
results = {}
# cd into a new folder and observe the error in the last line.
yes | rm*
echo "from email.utils import formatdate; print 'OK'" >
PYTHONPATH="" python
PYTHONPATH="" python
tianhuil /
Created Jun 20, 2015
A script to choose high-entropy but memorial passphrases
Script to generate passphrases according to the vocabular in:
To see why this might be a better algorithm for choosing a password:
Usage: ./ n m
tianhuil /
Created Oct 10, 2014
When you encounter pickle errors in multiprocessing
# from
from multiprocessing import Pool, cpu_count
from multiprocessing.pool import ApplyResult
# --------- see Stenven's solution above -------------
from copy_reg import pickle
from types import MethodType
def _pickle_method(method):
tianhuil /
Last active Aug 29, 2015
Installing Hadoop on Ubuntu
# Instaling Ubuntu
# While, we're at it, let's install the JDK ...
echo "y" | sudo apt-get install openjdk-6-jdk
# ... and Yelp's mrjob
pip install mrjob
# Install maven
tianhuil /
Created May 9, 2014
Setting up virtual environments
# if you need to install virtualenv
# pip install virtualenv
# in the folder
virtualenv venv
. venv/bin/activate
# to deactivate
# deactivate
You can’t perform that action at this time.