Skip to content

Instantly share code, notes, and snippets.

View tdunning's full-sized avatar

Ted Dunning tdunning

View GitHub Profile
@tdunning
tdunning / read.svmlight
Last active September 2, 2015 23:40
Some code that reads data in SVM Light format and returns a list of target and data. The data is stored in a sparse matrix to make this a bit more memory efficient.
require(Matrix)
read.svmlight2 <- function( filename ) {
f <- file( filename, "r")
lines = readLines( f )
close(f)
temp = strsplit(lines,'[: ]')
target = sapply(temp, function(row){as.numeric(row[1])})
raw = lapply(temp, function(row){
n = length(row);
@tdunning
tdunning / multi-flatten.json
Created May 29, 2015 15:48
Sample data for multi-list flattening
{"t":[100.44380099679421,200.7658959087325,301.6982003576183],"v1":[-3.2124876152877886,-17.9729521628487,-11.10212822944568],"v2":[20.668968387311498,39.70574384652023,33.97732641377096]}
{"t":[402.02369599592936,500.361291107067,601.0570362962695],"v1":[-10.357254868666516,-15.358599092346992,-18.17981637433697],"v2":[28.14680001787952,24.449514473332897,36.90232317832954]}
{"t":[700.3591871348365,801.7683561318721,901.3906331330202],"v1":[-4.184888093902327,2.266724195547855,1.8027188779133356],"v2":[43.63398291814092,37.96260382309654,34.558440299721525]}
{"t":[998.8508331065062,1097.1401685144158,1198.1819481032155],"v1":[-1.3711844607134631,5.5027134050661735,5.111544086242255],"v2":[34.518652181145356,39.89433181166691,44.621036340105604]}
{"t":[1298.1931737425077,1398.9874465151283,1498.3317303744573],"v1":[5.597833929076392,23.21742527898042,20.160346681365283],"v2":[46.6957213571881,36.773578638699526,26.80096644321689]}
{"t":[1598.4807447946152,1698.0002145693118,1797.9744831102964],"v1":[30.550780
@tdunning
tdunning / desired-output.json
Created May 29, 2015 15:54
Desired output of multi-flatten operation
{"t":100.44380099679421,"v1":-3.2124876152877886,"v2":20.668968387311498}
{"t":200.7658959087325,"v1":-17.9729521628487,"v2":39.70574384652023}
{"t":301.6982003576183,"v1":-11.10212822944568,"v2":33.97732641377096}
{"t":402.02369599592936,"v1":-10.357254868666516,"v2":28.14680001787952}
{"t":500.361291107067,"v1":-15.358599092346992,"v2":24.449514473332897}
{"t":601.0570362962695,"v1":-18.17981637433697,"v2":36.90232317832954}
{"t":700.3591871348365,"v1":-4.184888093902327,"v2":43.63398291814092}
{"t":801.7683561318721,"v1":2.266724195547855,"v2":37.96260382309654}
{"t":901.3906331330202,"v1":1.8027188779133356,"v2":34.558440299721525}
{"t":998.8508331065062,"v1":-1.3711844607134631,"v2":34.518652181145356}
@tdunning
tdunning / multi-schema.json
Created May 29, 2015 15:57
Schema for generating data for multi-flatten example
[
{"name":"t", "class":"sequence", "lengthDistribution": 3, "base":{
"class":"random-walk", "sd": 1, "mean": 100}},
{"name":"v1", "class":"sequence", "lengthDistribution": 3, "base":{
"class":"random-walk", "sd": 10, "mean": 0}},
{"name":"v2", "class":"sequence", "lengthDistribution": 3, "base":{
"class":"random-walk", "sd": 10, "mean": 0}}
]
@tdunning
tdunning / fig-1-random-projection.png
Last active August 29, 2015 14:25
Extremely ordered data appears random when you look at it from a random direction
fig-1-random-projection.png
@tdunning
tdunning / RandomNumberGenerator.java
Created July 20, 2015 23:55
Random number generate as UDF.
public class RandomNumberGenerator {
@FunctionTemplate(name = "random", scope = FunctionTemplate.FunctionScope.SIMPLE, nulls = FunctionTemplate.NullHandling.NULL_IF_NULL)
public static class Uniform implements DrillSimpleFunc {
@Param
Float8Holder low;
@Param
Float8Holder high;
@Output
Float8Holder output;
from collections import Counter
exchange_codes = {
'A': 'NYSE MKT Stock Exchange',
'B': 'NASDAQ OMX BX Stock Exchange',
'C': 'National Stock Exchange',
'D': 'FINRA',
'I': 'International Securities Exchange',
'J': 'Direct Edge A Stock Exchange',
'K': 'Direct Edge X Stock Exchange',
@tdunning
tdunning / speed.c
Created July 14, 2016 00:07
test of effect of flushing on speed of disk I/O on OSX
#define _GNU_SOURCE
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <sys/file.h>
#include <unistd.h>
#include <errno.h>
@tdunning
tdunning / td-in-r.r
Last active November 26, 2018 20:05
A simplified implementation of a merging t-digest in R with some visualization of the results
### x is either a vector of numbers or a data frame with sums and weights. Digest is a data frame.
merge = function(x, digest, compression=100) {
## Force the digest to be a data.frame, possibly empty
if (!is.data.frame(digest) && is.na(digest)) {
digest = data.frame(sum=c(), weight=c())
}
## and coerce the incoming data likewise ... a vector of points have default weighting of 1
if (!is.data.frame(x)) {
x = data.frame(sum=x, weight=1)
}