Skip to content

Instantly share code, notes, and snippets.

@tianhuil
tianhuil / multiprocessing_pickle_hack.py
Created October 10, 2014 20:02
When you encounter pickle errors in multiprocessing
# from http://stackoverflow.com/questions/1816958/cant-pickle-type-instancemethod-when-using-pythons-multiprocessing-pool-ma
from multiprocessing import Pool, cpu_count
from multiprocessing.pool import ApplyResult
# --------- see Stenven's solution above -------------
from copy_reg import pickle
from types import MethodType
def _pickle_method(method):
@tianhuil
tianhuil / dicewords.py
Created June 20, 2015 15:00
A script to choose high-entropy but memorial passphrases
#!/usr/bin/python
"""
Script to generate passphrases according to the vocabular in:
http://world.std.com/~reinhold/diceware.wordlist.asc
To see why this might be a better algorithm for choosing a password:
https://blog.agilebits.com/2011/06/21/toward-better-master-passwords/
Usage: ./dicewords.py n m
@tianhuil
tianhuil / Learning_React.md
Created December 27, 2016 03:56
Learning React
# cd into a new folder and observe the error in the last line.
yes | rm email.py*
echo "from email.utils import formatdate; print 'OK'" > foo.py
PYTHONPATH="" python foo.py
touch email.py
PYTHONPATH="" python foo.py
@tianhuil
tianhuil / update_branches.ipy
Last active December 3, 2017 18:25
Finds branches that can be safely merged
#!ipython
# to use this run these two steps
# curl https://gist.githubusercontent.com/tianhuil/a6675835a7a0c157fbcb296a743f52d4/raw/704a20201dd4362928f6e39ab0ab0bc0784b2af9/update_branches.ipy
# ipython merged_branches.ipy
!git checkout master
branches = !git branch
other_branches = [b.lstrip() for b in branches if b != '* master']
results = {}
@tianhuil
tianhuil / Residua Estimator.py
Last active November 8, 2019 22:54
This is a Residual Regressor and Residual Classifier
from sklearn.base import BaseEstimator, ClassifierMixin, RegressorMixin
class _ResidualEstimator(BaseEstimator):
def __init__(self, base, residual):
self.base = base
self.residual = residual
def fit(self, X, y):
self.base.fit(X, y)
self.residual.fit(X, y - self.base.predict(X))
@tianhuil
tianhuil / classifier_transform.py
Created November 9, 2019 05:20
Turns classifier's `predict_proba` into a transform
from sklearn.base import BaseEstimator, TransformerMixin
class ClassifierTransform(BaseEstimator, TransformerMixin):
def __init__(self, clf):
self.clf = clf
def fit(self, X, y=None):
self.clf.fit(X, y)
return self
@tianhuil
tianhuil / shard_data.py
Last active November 12, 2019 19:02
Data Sharding (useful preprocessing for dask)
import gzip
import os
from itertools import islice
import argparse
# from https://stackoverflow.com/a/41333436/8930600
def grouper(iterable, n):
iterator = iter(iterable)
while True:
group = tuple(islice(iterator, n))
@tianhuil
tianhuil / Solve.md
Last active June 1, 2020 19:45
Practically solving the shape

Suppose we wish to understand if a given X/Y pair falls within a set of non-intersecting arbitrary polygons. For example, we are trying to see if a given lat,long falls within a Michigan minor civil division. There are 1500+ such divisions each one of which could be a polygon with hundreds or thousands of points. The following is a practical solution that is fast enough to return to a user.

  1. Break up Michigan into $N$ box oriented tiles. We can easily calculate if x, y will only fall within a tile based on the value (x % WIDTH, y % HEIGHT). Select $N$ such that most (e.g. ~60%?) tiles are fully contained within a polygon, and the vast majority (e.g. 95%) are in no more than 4 polygons. I imagine this would happen easy at $N$ less than 100,000.
  2. Preprocess the polygon shape files using Shapely to determine a map
@tianhuil
tianhuil / starbucks_us_locations.csv
Created June 10, 2020 01:16 — forked from dankohn/starbucks_us_locations.csv
8902 locations of US Starbucks with addresses, latitude, and longitude
We can't make this file beautiful and searchable because it's too large.
-149.8935557,61.21759217,Starbucks - AK - Anchorage 00001,"601 West Street_601 West 5th Avenue_Anchorage, Alaska 99501_907-277-2477"
-149.9054948,61.19533942,Starbucks - AK - Anchorage 00002,"Carrs-Anchorage #1805_1650 W Northern Lights Blvd_Anchorage, Alaska 99503_907-339-0500"
-149.7522,61.2297,Starbucks - AK - Anchorage 00003,"Elmendorf AFB_Bldg 5800 Westover Avenue_Anchorage, Alaska 99506"
-149.8643361,61.19525062,Starbucks - AK - Anchorage 00004,"Fred Meyer - Anchorage #11_1000 E Northern Lights Blvd_Anchorage, Alaska 995084283_907-264-9600"
-149.8379726,61.13751355,Starbucks - AK - Anchorage 00005,"Fred Meyer - Anchorage #656_2300 Abbott Road_Anchorage, Alaska 99507_907-365-2000"
-149.9092788,61.13994658,Starbucks - AK - Anchorage 00006,"Fred Meyer - Anchorage (Dimond) #71_2000 W Dimond Blvd_Anchorage, Alaska 995151400_907-267-6700"
-149.7364877,61.19533265,Starbucks - AK - Anchorage 00007,"Safeway-Anchorage #1817_7731 E Northern Lights Blvd_Anchorage, Alaska 99504_907-331-1700"
-149.8211,61.2156