Skip to content

Instantly share code, notes, and snippets.

View tdunning's full-sized avatar

Ted Dunning tdunning

View GitHub Profile
@tdunning
tdunning / viewpoints.r
Last active July 12, 2019 20:31
how different definition of distance changes our view of clustering
# you can run this script with the following R command:
# source('https://gist.githubusercontent.com/tdunning/badb88043d41d916a3148c669f2fb0cd/raw/8d3289fdbf2a7999bd5d9687002488b904e1d82f/viewpoints.r')
set.seed(1)
noise = matrix(nrow=2000, ncol=8, data=rnorm(4*8*500))
offsets = matrix(
c(rep(-1,1000), rep(1,1000),
rep(-1, 500), rep(1, 500), rep(-1, 500), rep(1, 500)),
ncol=2)
xy = rbind(matrix(nrow=2000, ncol=2, data=rnorm(2*2000))) + offsets * 8
@tdunning
tdunning / Summarizer.java
Created April 12, 2019 23:18
Demonstrates the summarization of database fields using t-digest
package com.tdunning.tdigest.quality;
import com.google.common.collect.ImmutableList;
import com.google.common.io.Resources;
import com.tdunning.math.stats.MergingDigest;
import com.tdunning.math.stats.TDigest;
import org.junit.Test;
import java.io.File;
import java.io.IOException;
@tdunning
tdunning / MomentSketchOffsetTest.java
Created March 25, 2019 22:12
Test for moment sketches versus offset distribution
public class MomentSketchOffsetTest {
@Test
public void testOffsetUniform() throws Exception {
MomentSketch ms = new MomentSketch(1e-10);
ms.setSizeParam(7);
ms.initialize();
double[] data = TestDataSource.getUniform(20e1, 20e1 + 1, 1_000_000);
ms.add(data);
@tdunning
tdunning / HighDynamicRangeQuantile.java
Last active August 25, 2017 06:50 — forked from oertl/HighDynamicRangeQuantile.java
Simpler and slightly faster version of Otmar Oertl's idea for improving FastHistogram / HdrHistogram
public class HighDynamicRangeQuantile {
private final long[] counts;
private double minimum = Double.POSITIVE_INFINITY;
private double maximum = Double.NEGATIVE_INFINITY;
private long underFlowCount = 0;
private long overFlowCount = 0;
private final double factor;
private final double offset;
private final double minExpectedQuantileValue;
@tdunning
tdunning / td-in-r.r
Last active November 26, 2018 20:05
A simplified implementation of a merging t-digest in R with some visualization of the results
### x is either a vector of numbers or a data frame with sums and weights. Digest is a data frame.
merge = function(x, digest, compression=100) {
## Force the digest to be a data.frame, possibly empty
if (!is.data.frame(digest) && is.na(digest)) {
digest = data.frame(sum=c(), weight=c())
}
## and coerce the incoming data likewise ... a vector of points have default weighting of 1
if (!is.data.frame(x)) {
x = data.frame(sum=x, weight=1)
}
@tdunning
tdunning / speed.c
Created July 14, 2016 00:07
test of effect of flushing on speed of disk I/O on OSX
#define _GNU_SOURCE
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <sys/file.h>
#include <unistd.h>
#include <errno.h>
from collections import Counter
exchange_codes = {
'A': 'NYSE MKT Stock Exchange',
'B': 'NASDAQ OMX BX Stock Exchange',
'C': 'National Stock Exchange',
'D': 'FINRA',
'I': 'International Securities Exchange',
'J': 'Direct Edge A Stock Exchange',
'K': 'Direct Edge X Stock Exchange',
@tdunning
tdunning / RandomNumberGenerator.java
Created July 20, 2015 23:55
Random number generate as UDF.
public class RandomNumberGenerator {
@FunctionTemplate(name = "random", scope = FunctionTemplate.FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
public static class Uniform implements DrillSimpleFunc {
@Param
Float8Holder low;
@Param
Float8Holder high;
@Output
Float8Holder output;
@tdunning
tdunning / fig-1-random-projection.png
Last active August 29, 2015 14:25
Extremely ordered data appears random when you look at it from a random direction
fig-1-random-projection.png