Skip to content

Instantly share code, notes, and snippets.

/**
* @author Danny Miller k2xl@k2xl.com
*/
import java.util.Collection;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import backtype.storm.task.OutputCollector;
// Copyright 2014 BrightTag, Inc. All rights reserved.
package com.brighttag.storm.utils;
import backtype.storm.tuple.Values;
import storm.trident.operation.BaseFunction;
import storm.trident.operation.TridentCollector;
import storm.trident.tuple.TridentTuple;
/**
* Converts the first tuple from a byte array into a string.
@miguno
miguno / GenCountMinSketc.scala
Created September 29, 2014 16:25
Draft of generic CMS
package com.miguno.algebird.extensions
import com.twitter.algebird.{Approximate, Monoid, MonoidAggregator}
import scala.collection.immutable.SortedSet
class GenCountMinSketchMonoid[K: Ordering : GenCMSHasher](eps: Double, delta: Double, seed: Int,
heavyHittersPct: Double = 0.01) extends Monoid[GenCMS[K]] {
assert(0 < eps && eps < 1, "eps must lie in (0, 1)")
@miguno
miguno / CMSBenchmark.scala
Last active August 29, 2015 14:07
Benchmarking Count-Min Sketch and SketchMap implementations in Algebird
package com.twitter.algebird.caliper
import com.google.caliper.api.Macrobenchmark
import com.google.caliper.{Param, Benchmark}
import com.twitter.algebird.CountMinSketchMonoid
/**
* Run via [[com.twiter.algebird.caliper.Runner]], using the following CLI options:
*
* "--time-limit 90s --instrument macro com.twitter.algebird.caliper.CMSBenchmark"
@miguno
miguno / CountMinSketchTest.scala
Last active August 29, 2015 14:07
Negative example of TopNCMS (a top-N CMS variant) failing to compute heavy hitters correctly
// This unit test will fail because merging top-N based heavy hitters
// is not associative; see https://github.com/twitter/algebird/issues/353
"compute heavy hitters correctly (regression test of GH-353)" in {
val topN = 2
val monoid = TopNCMS.monoid(EPS, DELTA, SEED, topN)
val data1 = Seq(1, 1, 1, 2, 2, 3).toK[K]
val data2 = Seq(3, 4, 4, 4, 5, 5).toK[K]
val data3 = Seq(3, 6, 6, 6, 7, 7).toK[K]
val data4 = Seq(3, 8, 8, 8, 9, 9).toK[K]
@miguno
miguno / benchmark.md
Last active August 29, 2015 14:09
@specialized(Int, Long) K benchmark

See the discussion in Algebird pull request #354.

> cappi::benchmarkOnly com.twitter.algebird.caliper.CMSBenchmark

Summary

The benchmark results indicate that @specialized(Int, Long) K does not improve the CMS[K] performance for Long.

Before @specialized, for Long:

@miguno
miguno / CMSHashingBenchmark.scala
Created January 7, 2015 15:57
Benchmark for hashing implementations for CMS[BigInt] in Algebird
package com.twitter.algebird.caliper
import com.google.caliper.{ Param, SimpleBenchmark }
import com.google.common.hash.{ HashFunction, Hashing }
/**
* Benchmarks the hashing algorithms used by Count-Min sketch for CMS[BigInt].
*
* The input values are generated ahead of time to ensure that each trial uses the same input (and that the RNG is not
* influencing the runtime of the trials).
@miguno
miguno / entropy.scala
Created January 8, 2015 12:21
Entropy
private def entropy(counts: Vector[Long], totalCount: Double): Double = {
if (totalCount == 0) {
return 0
}
def log2(x: Double) = scala.math.log(x) / scala.math.log(2)
counts.filter(_ != 0).map { count =>
val freq = 1.0 * count / totalCount
freq * log2(freq)
}.foldLeft(0.0) { case (agg, i) => agg - i }
@miguno
miguno / keybase.md
Created February 28, 2015 17:23
keybase.md

Keybase proof

I hereby claim:

  • I am miguno on github.
  • I am miguno (https://keybase.io/miguno) on keybase.
  • I have a public key whose fingerprint is 0528 C791 1372 04EE C0CE 277A 796E 57B8 6D7C BA34

To claim this, I am signing this object:

hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/gutenberg/gutenberg/gutenberg.txt could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)