Skip to content

Instantly share code, notes, and snippets.

Avatar

Reynold Xin rxin

View GitHub Profile
@rxin
rxin / ramdisk.sh
Last active Feb 3, 2022
ramdisk create/delete on Mac OS X.
View ramdisk.sh
#!/bin/bash
# From http://tech.serbinn.net/2010/shell-script-to-create-ramdisk-on-mac-os-x/
#
ARGS=2
E_BADARGS=99
if [ $# -ne $ARGS ] # correct number of arguments to the script;
then
@rxin
rxin / benchmark.scala
Created Apr 22, 2015
quasiquote vs janino
View benchmark.scala
package org.apache.spark.sql.catalyst.expressions.codegen
import org.codehaus.janino.SimpleCompiler
object CodeGenBenchmark {
def quasiquotes(): Unit = {
import scala.reflect.runtime.{universe => ru}
import scala.reflect.runtime.universe._
@rxin
rxin / ByteBufferPerf.scala
Last active May 14, 2018
Comparison of performance over various approaches to read Java ByteBuffer. The best way is to use Unsafe, which also enables reading multiple primitive data types from the same buffer.
View ByteBufferPerf.scala
/**
* To compile:
* scalac -optimize ByteBufferPerf.scala
*
* JAVA_OPTS="-Xmx2g" scala IntArrayPerf 10
* 49 62 48 45 48 45 48 50 47 45
*
* JAVA_OPTS="-Xmx2g" scala ByteBufferPerf 10
* 479 491 484 480 484 481 477 477 472 473
@rxin
rxin / BinarySearch.java
Created Jul 19, 2015
binary search vs linear scan
View BinarySearch.java
package com.databricks.unsafe.util.benchmark;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
@rxin
rxin / generated-assembly.txt
Created Feb 14, 2017
Processing trillion rows per second on a single machine: how can nested loop joins be this fast?
View generated-assembly.txt
Decoding compiled method 0x00007f4d0510f9d0:
Code:
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x00007f4ce9662458} 'join' '(JI)J' in 'Test'
0x00007f4d0510fb20: call 0x00007f4d1abd5a30 ; {runtime_call}
0x00007f4d0510fb25: data16 data16 nop WORD PTR [rax+rax*1+0x0]
0x00007f4d0510fb30: mov DWORD PTR [rsp-0x14000],eax
0x00007f4d0510fb37: push rbp
@rxin
rxin / df.py
Last active Jan 26, 2017
DataFrame simple aggregation performance benchmark
View df.py
data = sqlContext.load("/home/rxin/ints.parquet")
data.groupBy("a").agg(col("a"), avg("num")).collect()
@rxin
rxin / gist:6896688
Last active Dec 25, 2015
take async
View gist:6896688
def takeAsync(num: Int): FutureAction[Seq[T]] = {
val promise = new CancellablePromise[Seq[T]]
promise.run {
val buf = new ArrayBuffer[T](num)
val totalParts = self.partitions.length
var partsScanned = 0
while (buf.size < num && partsScanned < totalParts && !promise.cancelled) {
// The number of partitions to try in this iteration. It is ok for this number to be
@rxin
rxin / update.sh
Last active Dec 21, 2015
Update Spark/Shark on EC2 AMI
View update.sh
set -e
set -o pipefail
/root/spark/bin/stop-all.sh
rm -rf ~/.ivy2/local/org.spark*
rm -rf ~/.ivy2/cache/org.spark*
cd /root/spark
git checkout master
@rxin
rxin / InsertPerf.scala
Last active Dec 19, 2015
Scala collection insert performance (2.9.3)
View InsertPerf.scala
// 1001 381 384 384 383 384 407 404 409 407
object ArrayBufferBenchmark extends scala.testing.Benchmark {
def run = {
val len = 10 * 1000 * 1000
val a = new scala.collection.mutable.ArrayBuffer[Int](len)
View BytecodeAnalyzer.scala
package spark.util
import java.io.{ByteArrayInputStream, ByteArrayOutputStream}
import scala.collection.mutable
import org.objectweb.asm.{ClassReader, MethodVisitor}
import org.objectweb.asm.commons.EmptyVisitor
import org.objectweb.asm.Opcodes._