Skip to content

Instantly share code, notes, and snippets.

View rxin's full-sized avatar

Reynold Xin rxin

View GitHub Profile
@rxin
rxin / ampcamp-ecnu-2013-data.sh
Last active December 14, 2015 10:49
scripts to help setup ampcamp @ ECNU March 2013
################################################################################
# Step 1. Download wiki traffic log.
# from
# https://s3.amazonaws.com/ampcamp/ampcamp-ecnu-2013/wikistats/part-00095.gz
# to
# https://s3.amazonaws.com/ampcamp/ampcamp-ecnu-2013/wikistats/part-00168.gz
# Note that 095 and 168 are both 0 bytes. The sole purpose of their existence is
# to verify the downloads.
# NOTE THAT THE FOLLOWING SCRIPT STARTS wget AS BACKGROUND PROCESSES.
@rxin
rxin / ramdisk.sh
Last active December 13, 2023 02:40
ramdisk create/delete on Mac OS X.
#!/bin/bash
# From http://tech.serbinn.net/2010/shell-script-to-create-ramdisk-on-mac-os-x/
#
ARGS=2
E_BADARGS=99
if [ $# -ne $ARGS ] # correct number of arguments to the script;
then
@rxin
rxin / ByteBufferPerf.scala
Last active May 14, 2018 21:27
Comparison of performance over various approaches to read Java ByteBuffer. The best way is to use Unsafe, which also enables reading multiple primitive data types from the same buffer.
/**
* To compile:
* scalac -optimize ByteBufferPerf.scala
*
* JAVA_OPTS="-Xmx2g" scala IntArrayPerf 10
* 49 62 48 45 48 45 48 50 47 45
*
* JAVA_OPTS="-Xmx2g" scala ByteBufferPerf 10
* 479 491 484 480 484 481 477 477 472 473
def testWrite(path: String): Long = {
val startTime = System.currentTimeMillis()
val out = new java.io.FileWriter(path)
var i = 1
val bytes = " " * (1024 * 1024)
while (i < 1000) {
out.write(bytes)
i += 1
}
out.close
package spark.util
import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
import scala.collection.mutable
import org.objectweb.asm.{ClassReader, MethodVisitor}
import org.objectweb.asm.commons.EmptyVisitor
import org.objectweb.asm.Opcodes._
@rxin
rxin / InsertPerf.scala
Last active December 19, 2015 03:49
Scala collection insert performance (2.9.3)
// 1001 381 384 384 383 384 407 404 409 407
object ArrayBufferBenchmark extends scala.testing.Benchmark {
def run = {
val len = 10 * 1000 * 1000
val a = new scala.collection.mutable.ArrayBuffer[Int](len)
@rxin
rxin / update.sh
Last active December 21, 2015 01:09
Update Spark/Shark on EC2 AMI
set -e
set -o pipefail
/root/spark/bin/stop-all.sh
rm -rf ~/.ivy2/local/org.spark*
rm -rf ~/.ivy2/cache/org.spark*
cd /root/spark
git checkout master
@rxin
rxin / gist:6896688
Last active December 25, 2015 01:39
take async
def takeAsync(num: Int): FutureAction[Seq[T]] = {
val promise = new CancellablePromise[Seq[T]]
promise.run {
val buf = new ArrayBuffer[T](num)
val totalParts = self.partitions.length
var partsScanned = 0
while (buf.size < num && partsScanned < totalParts && !promise.cancelled) {
// The number of partitions to try in this iteration. It is ok for this number to be
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*

I was curious about the results reported here, which reports that Scala's mutable maps are slower than Java's: http://www.infoq.com/news/2011/11/yammer-scala

In my tests, Scala's OpenHashMap equals or beats java's HashMap:

Insertion 100k elements (String keys) time in ms:

  • scala HashMap: 92.75
  • scala OpenHashMap: 14.03125
  • java HashMap: 15.78125