Skip to content

Instantly share code, notes, and snippets.

@abramsm
abramsm / HyperLogLogPlus.java
Created February 1, 2013 03:01
Offering object to HyperLogLog++
public boolean offer(Object o)
{
long x = MurmurHash.hash64(o);
switch (format)
{
case NORMAL:
// find first p bits of x
final long idx = x >>> (64 - p);
//Ignore the first p bits (the idx), and then find the number of leading zeros
@abramsm
abramsm / gist:5297210
Last active December 15, 2015 17:29
Testing merging n small sets in HyperLogLogPLus
public static void main(final String[] args) throws Throwable {
long startTime = System.currentTimeMillis();
int numSets = 10;
int setSize = 1 * 1000 * 1000;
int repeats = 5;
HyperLogLogPlus[] counters = new HyperLogLogPlus[numSets];
for (int i = 0; i < numSets; i++) {
counters[i] = new HyperLogLogPlus(15, 15);
@abramsm
abramsm / gist:5483928
Created April 29, 2013 19:09
example HLL+ for different p values
for (int p = 10; p < 18; p++)
{
HyperLogLogPlus hyperLogLogPlus = new HyperLogLogPlus(p, 25);
int count = 4200000;
for (int i = 0; i < count; i++)
{
hyperLogLogPlus.offer("i" + i);
}
long estimate = hyperLogLogPlus.cardinality();
double se = count * (1.04 / Math.sqrt(Math.pow(2, p)));
@abramsm
abramsm / gist:5484296
Created April 29, 2013 19:59
hll+ example with merge
for (int p = 10; p < 18; p++)
{
ICardinality[] hlls = new ICardinality[30];
int ecount = 15000;
int totalCount = ecount * 30;
for (int j = 0; j < 30; j++)
{
hlls[j] = new HyperLogLogPlus(p, 25);
for (int i = 0; i < ecount; i++)
{
@abramsm
abramsm / gist:8941283
Created February 11, 2014 18:42
Sample Hydra Source for consuming log-synth data
"source":{
"type":"mesh2",
"hash":true,
"mesh":{
"files":["log-synth/sample*"],
},
"format":{
"type":"column",
"columns":["QUERY_TIME", "UID", "IP", "TERMS"],
"tokens":{
@abramsm
abramsm / gist:8943880
Last active August 29, 2015 13:56
Sample Hydra Map Section
"map":{
"filterOut":{"op":"chain", "filter":[
{"op":"field", "from":"UID", "filter":{"op":"trim"}},
{"op":"field", "from":"IP", "filter":{"op":"trim"}},
{"op":"field", "from":"TERMS", "filter":{"op":"trim"}},
{"op":"num", "columns":["QUERY_TIME", "QUERY_TIME_MOD"], "define":"c0,v1000,dmult,v1,set"},
{"op":"debug"},
]},
},
@abramsm
abramsm / gist:8980561
Last active August 29, 2015 13:56
Sample Hydra Output Section
"output":{
"type":"tree",
"root":{"path":"TREE"},
"paths":{
"TREE":[
{"type":"const", "value":"root", "data":{
"topterms":{"type":"key.top","key":"TERMS","size":1000},
"topuids":{"type":"key.top", "key":"UID", "size":1000},
"topip":{"type":"key.top", "key":"IP", "size":1000},
@abramsm
abramsm / gist:8980751
Last active August 29, 2015 13:56
Log-Synth Hydra processing Example
// -Dpagedb.kvstore.type=1
// -Deps.gz.type=3
// -Deps.cache.pages=100000
// -Xmx2G
{
"type":"map",
taskthreads:2,
"source":{
"type":"mesh2",
@abramsm
abramsm / gist:9053108
Created February 17, 2014 15:54
sample log-synth data
3.535, 5214d63bab95687d, 166.144.203.186, "the then good"
3.568, 5dbd9451948ad895, 88.120.153.226, "know boys"
4.206, 5dbd9451948ad895, 88.120.153.226, "to"
4.673, b967d99cad0b3e60, 88.120.153.226, "seven"
4.900, bd0d760fbb338955, 166.144.203.186, "did local if to"
6.166, ef909223e4873178, 166.144.203.186, "every to"
7.050, ff1fda5a8c6361fe, 166.144.203.186, "talking from wore"
8.114, 90fbf36695d3a2d, 176.205.174.108, "was i favorite papa"
8.732, 3ef5a81b79e149a6, 166.144.203.186, "us pile we it"
9.697, 8a9d23755e58f66, 88.120.153.226, "make to"
{"op":"chain", "filter":[
{"op":"field", "from":"UID"},
{"op":"equals", "left":"FIELD_ONE", "right":"FIELD_TWO", "not":true},
{"op":"concat", "in":["FOO", "BAR"], "out":"OUTPUT", "join":":"},
{"op":"num", "columns":["END", "START", "WALL"], "define":"c0,c1,sub,
v1000,ddiv,toint,v2,set"},
]}