Skip to content

Instantly share code, notes, and snippets.

View erikbern's full-sized avatar

Erik Bernhardsson erikbern

View GitHub Profile
import luigi
# Here we are importing our own tasks, provided they are
# arranged in a python module (folder) named "components"
from components.SomeTaskA import SomeTaskA
from components.SomeTaskB import SomeTaskB
from components.SomeTaskC import SomeTaskC
# ------------------------------------------
# DEFINE THE MAIN WORKFLOW DEPENDENCY GRAPH
import os, shutil
import luigi
import sparkey
import random
class SparkeyTarget(luigi.Target):
def __init__(self, path=None, spi='data.spi', spl='data.spl', writer_cls=sparkey.HashWriter, reader_cls=sparkey.HashReader):
self.path = path
self.spi_path = spi
self.spl_path = spl
@erikbern
erikbern / gist:fc05e8cccd64dccde630
Last active August 29, 2015 14:03
Generate Dirichlet distribution
import random, time
import pylab, numpy
def method1(n):
s = 1.0
r = []
for i in xrange(n):
t = s * (1 - random.random() ** (1.0 / (n - i)))
s -= t
r.append(t)
def tabCounter() = {
implicit def input = getInput()
input.map(_.split('\t').size).reduce(_ + _)
}
val task = LuigiTask().requires(MyTsvJob(buildId)).output(HdfsTarget("output")).do(tabCounter)
val otherTask = LuigiTask().requires(task).output(HdfsTarget("output-2")).do(somethingElse)
otherTask.run() // schedule task and otherTask
if (indices.size() <= (size_t)_K) {
for (size_t i = 0; i < indices.size(); i++)
m->children[i] = indices[i];
}
@erikbern
erikbern / gist:ba3456f836ccc9c044e8
Last active August 29, 2015 14:18
simple javascript task framework to flip recursion inside out
function serializeArgs(args) {
return JSON.stringify(args);
}
function unroll(f) {
if (f._cache == undefined)
f._cache = {};
var f_new = function() {
var key = serializeArgs(arguments);
import numpy as np
import matplotlib.pyplot as plt
mean = np.array([1, 1, 1])
cov = np.array([[1, 0.5, 0.5], [0.5, 1, 0.5], [0.5, 0.5, 0.5]])
people = np.random.multivariate_normal(mean, cov, 100000)
criterion = np.array([0, 0.2, 1.0])
scores = np.dot(people, criterion)
@erikbern
erikbern / gist:7cd899651aff9b656e9f
Last active August 29, 2015 14:24
Stupid graph visualizations
http://blog.revolutionanalytics.com/2015/07/the-network-structure-of-cran.html
http://radar.oreilly.com/2015/06/graphs-in-the-world-modeling-systems-as-networks.html
http://quid.com/insights/the-future-of-artificial-intelligence/?utm_content=bufferb9759&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
http://www.kpcb.com/blog/a-breakthrough-approach-to-making-data-useful
https://www.cbinsights.com/blog/e-commerce-investments-top-venture-capital-firms/
https://www.recordedfuture.com/two-shady-men-report/
http://www.galvanize.com/blog/2015/08/05/the-first-interactive-visualization-of-product-integrations/#.Vczc1RNVikq
def coin():
while True:
yield random.randint(0, 1)
def dice_optimal():
x, y, z = 0, 1, 1
for c in coin():
x, y, z = x+c*z, y+c*z, 2*z
a, b = x*6//z, y*6//z
@erikbern
erikbern / README.md
Last active September 8, 2015 03:29
Antipodes

The cities of Beijing and Buenos Aires are almost antipodes, i.e. they are situated almost opposite of each other on the globe.

An interesting attribute of this is that travelling between the two cities can be done through any point of earth and the trip will still be roughly the same distance. This D3 visualization demonstrates this principle: drag the map around to change perspective, click the map to set a "midpoint" to travel through.

Code for this visualization was stolen from all over the web without any deeper understanding of D3 – in particular this one and this one.

IIRC this is in a theme in Wong Kar-Wai's movie Happy Together.