Skip to content

Instantly share code, notes, and snippets.

Avatar

Erik Bernhardsson erikbern

View GitHub Profile
View gist:9628609
import luigi
# Here we are importing our own tasks, provided they are
# arranged in a python module (folder) named "components"
from components.SomeTaskA import SomeTaskA
from components.SomeTaskB import SomeTaskB
from components.SomeTaskC import SomeTaskC
# ------------------------------------------
# DEFINE THE MAIN WORKFLOW DEPENDENCY GRAPH
View gist:9811483
import os, shutil
import luigi
import sparkey
import random
class SparkeyTarget(luigi.Target):
def __init__(self, path=None, spi='data.spi', spl='data.spl', writer_cls=sparkey.HashWriter, reader_cls=sparkey.HashReader):
self.path = path
self.spi_path = spi
self.spl_path = spl
@erikbern
erikbern / gist:fc05e8cccd64dccde630
Last active Aug 29, 2015
Generate Dirichlet distribution
View gist:fc05e8cccd64dccde630
import random, time
import pylab, numpy
def method1(n):
s = 1.0
r = []
for i in xrange(n):
t = s * (1 - random.random() ** (1.0 / (n - i)))
s -= t
r.append(t)
View gist:8ddfbab33e1dae7014fe
def tabCounter() = {
implicit def input = getInput()
input.map(_.split('\t').size).reduce(_ + _)
}
val task = LuigiTask().requires(MyTsvJob(buildId)).output(HdfsTarget("output")).do(tabCounter)
val otherTask = LuigiTask().requires(task).output(HdfsTarget("output-2")).do(somethingElse)
otherTask.run() // schedule task and otherTask
View gist:89fe7e2c1a615084ee6d
if (indices.size() <= (size_t)_K) {
for (size_t i = 0; i < indices.size(); i++)
m->children[i] = indices[i];
}
@erikbern
erikbern / gist:ba3456f836ccc9c044e8
Last active Aug 29, 2015
simple javascript task framework to flip recursion inside out
View gist:ba3456f836ccc9c044e8
function serializeArgs(args) {
return JSON.stringify(args);
}
function unroll(f) {
if (f._cache == undefined)
f._cache = {};
var f_new = function() {
var key = serializeArgs(arguments);
View gist:4cff437097067142eca7
import numpy as np
import matplotlib.pyplot as plt
mean = np.array([1, 1, 1])
cov = np.array([[1, 0.5, 0.5], [0.5, 1, 0.5], [0.5, 0.5, 0.5]])
people = np.random.multivariate_normal(mean, cov, 100000)
criterion = np.array([0, 0.2, 1.0])
scores = np.dot(people, criterion)
@erikbern
erikbern / gist:7cd899651aff9b656e9f
Last active Aug 29, 2015
Stupid graph visualizations
View gist:7cd899651aff9b656e9f
http://blog.revolutionanalytics.com/2015/07/the-network-structure-of-cran.html
http://radar.oreilly.com/2015/06/graphs-in-the-world-modeling-systems-as-networks.html
http://quid.com/insights/the-future-of-artificial-intelligence/?utm_content=bufferb9759&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
http://www.kpcb.com/blog/a-breakthrough-approach-to-making-data-useful
https://www.cbinsights.com/blog/e-commerce-investments-top-venture-capital-firms/
https://www.recordedfuture.com/two-shady-men-report/
http://www.galvanize.com/blog/2015/08/05/the-first-interactive-visualization-of-product-integrations/#.Vczc1RNVikq
View coin2dice.py
def coin():
while True:
yield random.randint(0, 1)
def dice_optimal():
x, y, z = 0, 1, 1
for c in coin():
x, y, z = x+c*z, y+c*z, 2*z
a, b = x*6//z, y*6//z
View README.md

The cities of Beijing and Buenos Aires are almost antipodes, i.e. they are situated almost opposite of each other on the globe.

An interesting attribute of this is that travelling between the two cities can be done through any point of earth and the trip will still be roughly the same distance. This D3 visualization demonstrates this principle: drag the map around to change perspective, click the map to set a "midpoint" to travel through.

Code for this visualization was stolen from all over the web without any deeper understanding of D3 – in particular this one and this one.

IIRC this is in a theme in Wong Kar-Wai's movie Happy Together.