Skip to content

Instantly share code, notes, and snippets.

View albarrentine's full-sized avatar
💭
👾

Al B albarrentine

💭
👾
View GitHub Profile
@albarrentine
albarrentine / chunked_shuffle.sh
Created December 5, 2016 06:23
Randomly shuffle a newline-delimited file that's larger than main memory
set -e
if [ "$#" -lt 3 ]; then
echo "Usage: chunked_shuffle filename parts outfile"
exit 1
fi
filename=$1
parts=$2
outfile=$3
#!/usr/bin/env bash
./libpostal "12 Three-hundred and forty-fifth ave, ste. no 678" en
#12 345th avenue, suite number 678
./libpostal "C/ ocho P.I. cuatro" es
#calle 8 polígono industrial 4
./libpostal "V XX Settembre" it
#via 20 settembre
@albarrentine
albarrentine / libpostal_install.sh
Last active August 24, 2022 06:46
libpostal installation
# Pre-requisites: steps provided for Mac and Debian/Ubuntu
# 1. Install autotools if you don't have it already
# Mac: brew install autoconf automake libtool
# Ubuntu: apt-get install autotools-dev
# 2. Install snappy
# Mac: brew install snappy
# Ubuntu: apt-get install libsnappy-dev)
git clone https://github.com/openvenues/libpostal
cd libpostal
@albarrentine
albarrentine / keybase.md
Created May 7, 2015 04:31
Keybase gist

Keybase proof

I hereby claim:

  • I am thatdatabaseguy on github.
  • I am albarrentine (https://keybase.io/albarrentine) on keybase.
  • I have a public key whose fingerprint is 1161 3CA5 F450 D731 F502 A2C3 7479 17BB 69EE EBB1

To claim this, I am signing this object:

@albarrentine
albarrentine / gist:2577114
Created May 2, 2012 14:47
numpy.fromiter: matrix from fixed-length arrays
# Even more fun
def array_gen(some_strings):
for s in some_strings:
yield numpy.fromstring(s, dtype=numpy.int)
# Let's say I know the length of all the arrays coming out of my generator
# and I want to build a matrix
K = 10
M = 5
@albarrentine
albarrentine / gist:2577076
Created May 2, 2012 14:44
numpy.fromiter using a custom generator and data type
import numpy
# Imagine these come from a db cursor or something
coordinates = [(1,2,3), (4,5,6), (7,8,9)]
def my_gen(some_tuple):
for x, y, z in some_tuple:
yield x, y, z
a = numpy.fromiter(my_gen(coordinates), dtype=[('x', 'l'), ('y', 'l'), ('z', 'l')])
"""
@albarrentine
albarrentine / gist:2577070
Created May 2, 2012 14:43
Numpy set APIs
import numpy
a = numpy.arange(10)
b = numpy.arange(5, 15)
numpy.setdiff1d(a,b) # array([0, 1, 2, 3, 4])
numpy.union1d(a,b) # array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
numpy.intersect1d(a,b) # array([5, 6, 7, 8, 9])
@albarrentine
albarrentine / gist:1326477
Created October 30, 2011 21:38
Flask-Celery lazy configuration + app factory
# In myapp.tasks __init__.py
from celery import Celery
celery = Celery()
Task = celery.create_task_cls()
class MyBaseTask(Task):
abstract = True
# ...
@albarrentine
albarrentine / gist:1033728
Created June 19, 2011 03:31
SQLAlchemy's tuple_ construct
"""
Assuming a model and table "Thingy" with columns type and id (I love generic association tables,
so you will see this a lot in schemas I write).
While this seems like a headache for multigets, there is a very easy way to do an IN operation on such a table using SQL tuples (index obviously needs to be on both columns for it to be fast):
SELECT * FROM thingy WHERE (type, id) IN (('foo', 1), ('bar', 2));
And SQLAlchemy has a great construct for just this case...
"""
@albarrentine
albarrentine / gist:998045
Created May 29, 2011 19:09
webhelpers.misc.NotGiven, so much better than None default args
class webhelpers.misc.NotGiven
A default value for function args.
Use this when you need to distinguish between None and no value.
Example:
>>> def foo(arg=NotGiven):
... print arg is NotGiven
...