Skip to content

Instantly share code, notes, and snippets.

View drdee's full-sized avatar

Diederik van Liere drdee

View GitHub Profile
@drdee
drdee / sql_style_guide.md
Last active January 19, 2019 20:32 — forked from fredbenenson/kickstarter_sql_style_guide.md
SQL Style Guide
layout title description tags
default
SQL Style Guide
A guide to writing clean, clear, and consistent SQL.
data
process

Purpose

@drdee
drdee / gist:32b880a24bd2dd2d8d46
Created July 30, 2014 16:52
Cardinality estimator
import math
import mmh3
from functools import partial
from itertools import izip
def estimateCardinality(self, significant_bits)
'''
Taken and slightly adapted from http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinality-Estimation
Estimates the number of unique elements in the input set values.
significant_bits: The number of bits of hash to use as a bucket number; there will be 2**k buckets.
@drdee
drdee / gist:d68eaf0208184d72cbff
Created July 29, 2014 15:58
PySpark countApproxDistinct
def error(estimate, size):
return abs(estimate - size) / float(size)
def uniform():
for x in xrange(100000):
yield x % 100
rdd = sc.parallelize([x for x in uniform()])
assert(error(rdd._jrdd.rdd().countApproxDistinct(4, 0), 100) < 0.4)
assert(error(rdd._jrdd.rdd().countApproxDistinct(8, 0), 100) < 0.1)
@drdee
drdee / gist:5580023
Last active December 17, 2015 08:28
Java library Debian package Platform Exact version match
metrics-core-2.2.0.jar Could not find it
metrics-annotation-2.2.0.jar Could not find it
zkclient-0.2.jar Seems to have been deprecated at V0.1
jopt-simple-3.2.jar libjoptsimple-java 3.1-3 precise/universe No
scala-compiler.jar scala Possibly
slf4j-api-1.7.2.jar libslf4j-java 1.6.4-1 precise/universe No
snappy-java-1.0.4.1.jar libsnappy-java (1.0.4.1~dfsg-1) Yes