Skip to content

Instantly share code, notes, and snippets.

@pchiusano
Created December 2, 2011 13:49
Show Gist options
  • Star 8 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save pchiusano/1423303 to your computer and use it in GitHub Desktop.
Save pchiusano/1423303 to your computer and use it in GitHub Desktop.
Simple microbenchmarks comparing Scala vs Java mutable map performance

I was curious about the results reported here, which reports that Scala's mutable maps are slower than Java's: http://www.infoq.com/news/2011/11/yammer-scala

In my tests, Scala's OpenHashMap equals or beats java's HashMap:

Insertion 100k elements (String keys) time in ms:

  • scala HashMap: 92.75
  • scala OpenHashMap: 14.03125
  • java HashMap: 15.78125

Insertion 100k elements (Integer keys) time in ms:

  • scala HashMap: 98.625
  • scala OpenHashMap: 13.71875
  • java HashMap: 23.5625

Hastily thrown together timing code below. I at least run enough trials to get a half second sample.

Scala's mutable HashMap implementation was known to be slower than HashMap, which motivated David MacIver to write OpenHashMap way back in 2008. Unfortunately, not many people know it.

def time(warmup: Int)(block: => Any) = {
  for (i <- 1 to warmup) block // possibly trigger JIT on freq used methods in block
  var goodSample = false
  var start = 0L; var stop = 0L;
  var N = 2
  while (!goodSample) {
    start = System.currentTimeMillis
    for (i <- 1 to N) block
    stop = System.currentTimeMillis
    if (stop - start < 500) N = N*4 // require at least half a second for a decent sample
    else goodSample = true
  }
  val perOp = (stop.toDouble - start.toDouble) / N
  perOp
}

val vs = (0 to 100000).map(_ toString).toArray
val warmup = 10
{
println (time (warmup) {  
  val m = new scala.collection.mutable.HashMap[String,Int]; 
  var i = 0;
  while(i<100000) { m.put(vs(i),i);i=i+1;};
})

println (time (warmup) {  
  val m = new scala.collection.mutable.OpenHashMap[String,Int]; 
  var i = 0;
  var start = System.currentTimeMillis();
  while(i<100000) { m.put(vs(i),i);i=i+1;};
})

println ( time (warmup) {
 val m = new java.util.HashMap[String,Int]; 
  var i = 0;
  while(i<100000) { m.put(vs(i),i);i=i+1;};
})
}

// With Integer keys
{
println ("scala HashMap: " + time (warmup) {  
  val m = new scala.collection.mutable.HashMap[Int,Int]; 
  var i = 0;
  while(i<100000) { m.put(i,i);i=i+1;};
})

println ("scala OpenHashMap: " + time (warmup) {  
  val m = new scala.collection.mutable.OpenHashMap[Int,Int]; 
  var i = 0;
  var start = System.currentTimeMillis();
  while(i<100000) { m.put(i,i);i=i+1;};
})

println ("java HashMap: " + time (warmup) {
 val m = new java.util.HashMap[Int,Int]; 
  var i = 0;
  while(i<100000) { m.put(i,i);i=i+1;};
})
}
@rxin
Copy link

rxin commented May 28, 2014

This might not be the most scientific benchmark because of potential GCs and hash collisions at low range. I tried increase the number from 100k to 1m, and then Java HashMap was faster. I also tested my own implementation of OpenHashMap that uses a bitmap for nulls and specialized the get/put.

bin/spark-class org.apache.spark.Test with 1 million integers
scala HashMap: 264.1
scala OpenHashMap: 375.7
Spark OpenHashMap: 96.5
java HashMap: 158.2

bin/spark-class org.apache.spark.Test with 100k integers
scala HashMap: 9.11875
scala OpenHashMap: 2.3890625
Spark OpenHashMap: 9.25625
java HashMap: 3.88125

The OpenHashMap implementation for Spark can be found at https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala

My test code is at https://gist.github.com/rxin/44657c3f40d4e4294857

@dsavvins
Copy link

Dunno - for me latest java vs latest Scala test 100K runs:
scala.collection.mutable.OpenHashMap[String, Int] insert 30ms get 7ms
HashMap<String,Integer> insert 20ms get 4ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment