Skip to content

Instantly share code, notes, and snippets.

Michael Li tianhuil

Block or report user

Report or block tianhuil

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@tianhuil
tianhuil / shard_data.py
Last active Nov 12, 2019
Data Sharding (useful preprocessing for dask)
View shard_data.py
import gzip
import os
from itertools import islice
import argparse
# from https://stackoverflow.com/a/41333436/8930600
def grouper(iterable, n):
iterator = iter(iterable)
while True:
group = tuple(islice(iterator, n))
@tianhuil
tianhuil / classifier_transform.py
Created Nov 9, 2019
Turns classifier's `predict_proba` into a transform
View classifier_transform.py
from sklearn.base import BaseEstimator, TransformerMixin
class ClassifierTransform(BaseEstimator, TransformerMixin):
def __init__(self, clf):
self.clf = clf
def fit(self, X, y=None):
self.clf.fit(X, y)
return self
@tianhuil
tianhuil / Residua Estimator.py
Last active Nov 8, 2019
This is a Residual Regressor and Residual Classifier
View Residua Estimator.py
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin
class _ResidualEstimator(BaseEstimator):
def __init__(self, base, residual):
self.base = base
self.residual = residual
def fit(self, X, y):
self.base.fit(X, y)
self.residual.fit(X, y - self.base.predict(X))
@tianhuil
tianhuil / update_branches.ipy
Last active Dec 3, 2017
Finds branches that can be safely merged
View update_branches.ipy
#!ipython
# to use this run these two steps
# curl https://gist.githubusercontent.com/tianhuil/a6675835a7a0c157fbcb296a743f52d4/raw/704a20201dd4362928f6e39ab0ab0bc0784b2af9/update_branches.ipy
# ipython merged_branches.ipy
!git checkout master
branches = !git branch
other_branches = [b.lstrip() for b in branches if b != '* master']
results = {}
View conflicting_packages.sh
# cd into a new folder and observe the error in the last line.
yes | rm email.py*
echo "from email.utils import formatdate; print 'OK'" > foo.py
PYTHONPATH="" python foo.py
touch email.py
PYTHONPATH="" python foo.py
View Learning_React.md
@tianhuil
tianhuil / dicewords.py
Created Jun 20, 2015
A script to choose high-entropy but memorial passphrases
View dicewords.py
#!/usr/bin/python
"""
Script to generate passphrases according to the vocabular in:
http://world.std.com/~reinhold/diceware.wordlist.asc
To see why this might be a better algorithm for choosing a password:
https://blog.agilebits.com/2011/06/21/toward-better-master-passwords/
Usage: ./dicewords.py n m
@tianhuil
tianhuil / multiprocessing_pickle_hack.py
Created Oct 10, 2014
When you encounter pickle errors in multiprocessing
View multiprocessing_pickle_hack.py
# from http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma
from multiprocessing import Pool, cpu_count
from multiprocessing.pool import ApplyResult
# --------- see Stenven's solution above -------------
from copy_reg import pickle
from types import MethodType
def _pickle_method(method):
@tianhuil
tianhuil / install_hadoop.sh
Last active Aug 29, 2015
Installing Hadoop on Ubuntu
View install_hadoop.sh
# Instaling Ubuntu
# http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Quick-Start/cdh4qs_topic_3_2.html
# While, we're at it, let's install the JDK ...
echo "y" | sudo apt-get install openjdk-6-jdk
# ... and Yelp's mrjob
pip install mrjob
# Install maven
@tianhuil
tianhuil / setup_venv.sh
Created May 9, 2014
Setting up virtual environments
View setup_venv.sh
# if you need to install virtualenv
# pip install virtualenv
# in the folder
virtualenv venv
. venv/bin/activate
# to deactivate
# deactivate
You can’t perform that action at this time.