Skip to content

Instantly share code, notes, and snippets.

Avatar

Reynold Xin rxin

View GitHub Profile
@rxin
rxin / ampcamp-ecnu-2013-data.sh
Last active Dec 14, 2015
scripts to help setup ampcamp @ ECNU March 2013
View ampcamp-ecnu-2013-data.sh
################################################################################
# Step 1. Download wiki traffic log.
# from
# https://s3.amazonaws.com/ampcamp/ampcamp-ecnu-2013/wikistats/part-00095.gz
# to
# https://s3.amazonaws.com/ampcamp/ampcamp-ecnu-2013/wikistats/part-00168.gz
# Note that 095 and 168 are both 0 bytes. The sole purpose of their existence is
# to verify the downloads.
# NOTE THAT THE FOLLOWING SCRIPT STARTS wget AS BACKGROUND PROCESSES.
@rxin
rxin / ramdisk.sh
Last active Feb 3, 2022
ramdisk create/delete on Mac OS X.
View ramdisk.sh
#!/bin/bash
# From http://tech.serbinn.net/2010/shell-script-to-create-ramdisk-on-mac-os-x/
#
ARGS=2
E_BADARGS=99
if [ $# -ne $ARGS ] # correct number of arguments to the script;
then
@rxin
rxin / ByteBufferPerf.scala
Last active May 14, 2018
Comparison of performance over various approaches to read Java ByteBuffer. The best way is to use Unsafe, which also enables reading multiple primitive data types from the same buffer.
View ByteBufferPerf.scala
/**
* To compile:
* scalac -optimize ByteBufferPerf.scala
*
* JAVA_OPTS="-Xmx2g" scala IntArrayPerf 10
* 49 62 48 45 48 45 48 50 47 45
*
* JAVA_OPTS="-Xmx2g" scala ByteBufferPerf 10
* 479 491 484 480 484 481 477 477 472 473
View testwrite.scala
def testWrite(path: String): Long = {
val startTime = System.currentTimeMillis()
val out = new java.io.FileWriter(path)
var i = 1
val bytes = " " * (1024 * 1024)
while (i < 1000) {
out.write(bytes)
i += 1
}
out.close
View BytecodeAnalyzer.scala
package spark.util
import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
import scala.collection.mutable
import org.objectweb.asm.{ClassReader, MethodVisitor}
import org.objectweb.asm.commons.EmptyVisitor
import org.objectweb.asm.Opcodes._
@rxin
rxin / InsertPerf.scala
Last active Dec 19, 2015
Scala collection insert performance (2.9.3)
View InsertPerf.scala
// 1001 381 384 384 383 384 407 404 409 407
object ArrayBufferBenchmark extends scala.testing.Benchmark {
def run = {
val len = 10 * 1000 * 1000
val a = new scala.collection.mutable.ArrayBuffer[Int](len)
@rxin
rxin / update.sh
Last active Dec 21, 2015
Update Spark/Shark on EC2 AMI
View update.sh
set -e
set -o pipefail
/root/spark/bin/stop-all.sh
rm -rf ~/.ivy2/local/org.spark*
rm -rf ~/.ivy2/cache/org.spark*
cd /root/spark
git checkout master
@rxin
rxin / gist:6896688
Last active Dec 25, 2015
take async
View gist:6896688
def takeAsync(num: Int): FutureAction[Seq[T]] = {
val promise = new CancellablePromise[Seq[T]]
promise.run {
val buf = new ArrayBuffer[T](num)
val totalParts = self.partitions.length
var partsScanned = 0
while (buf.size < num && partsScanned < totalParts && !promise.cancelled) {
// The number of partitions to try in this iteration. It is ok for this number to be
View gist:8910734
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
View microbenchmark.markdown

I was curious about the results reported here, which reports that Scala's mutable maps are slower than Java's: http://www.infoq.com/news/2011/11/yammer-scala

In my tests, Scala's OpenHashMap equals or beats java's HashMap:

Insertion 100k elements (String keys) time in ms:

  • scala HashMap: 92.75
  • scala OpenHashMap: 14.03125
  • java HashMap: 15.78125