Skip to content

Instantly share code, notes, and snippets.

View animeshtrivedi's full-sized avatar

Animesh Trivedi animeshtrivedi

View GitHub Profile
@animeshtrivedi
animeshtrivedi / ScalaInterpreterExample.java
Created October 6, 2016 15:04 — forked from metasim/ScalaInterpreterExample.java
Demo code for defining a scala class dynamically (as string) and load it into Java.
package eri;
import scala.collection.Iterator;
import scala.collection.JavaConversions;
import scala.collection.Seq;
import scala.reflect.internal.util.BatchSourceFile;
import scala.reflect.io.AbstractFile;
import scala.runtime.AbstractFunction1;
import scala.runtime.BoxedUnit;
import scala.tools.nsc.GenericRunnerSettings;
package com.ibm.crail.spark.tools;
/**
* Created by atr on 07.10.16.
*/
import scala.collection.Iterator;
import scala.collection.JavaConversions;
import scala.collection.Seq;
import scala.reflect.internal.util.AbstractFileClassLoader;
import scala.reflect.internal.util.BatchSourceFile;
@animeshtrivedi
animeshtrivedi / AtrFileFormat.scala
Last active January 11, 2017 12:59
AtrFileFormat which creates a null sink writer (AtrOutputWriter)
package org.apache.spark.sql.execution.datasources.atr
import org.apache.hadoop.fs.FileStatus
import org.apache.hadoop.mapreduce.{TaskAttemptContext, Job}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.execution.datasources.{OutputWriter, OutputWriterFactory, FileFormat}
import org.apache.spark.sql.sources.DataSourceRegister
import org.apache.spark.sql.types.StructType
import scala.tools.jline_embedded.internal.Log
@animeshtrivedi
animeshtrivedi / AtrOutputWriter.scala
Last active January 11, 2017 13:24
Null writer class, that discards all Row/InternalRow passed to it. The class, however, does some accounting.
package org.apache.spark.sql.execution.datasources.atr
import org.apache.hadoop.mapreduce.TaskAttemptContext
import org.apache.spark.sql.Row
import org.apache.spark.sql.catalyst.InternalRow
import org.apache.spark.sql.execution.datasources.OutputWriter
import scala.tools.jline_embedded.internal.Log
/**
@animeshtrivedi
animeshtrivedi / BroadcastTest.scala
Created May 4, 2017 11:10
A broadcast stress test for Spark.
/*
* Broadcast stress test for Spark
*
* Author: Animesh Trivedi <atr@zurich.ibm.com>
*
* Copyright (C) 2017, IBM Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
@animeshtrivedi
animeshtrivedi / GroupByTest.scala
Created May 4, 2017 15:26
Spark shuffle benchmark with variations of the groupBy test
import java.util.Random
import org.apache.spark.SparkContext
object GroupByTest {
def test(numMappers:Int = 10, numKVPairs:Int = 100, valSize:Int = 1024, numReducers:Int = 10) {
val pairs1 = sc.parallelize(0 until numMappers, numMappers).flatMap { p =>
val ranGen = new Random
val arr1 = new Array[(Int, Array[Byte])](numKVPairs)
for (i <- 0 until numKVPairs) {
@animeshtrivedi
animeshtrivedi / CrailTest.scala
Created May 5, 2017 14:58
Crail benchmark tests as Spark program
import java.nio.ByteBuffer
import java.util.Random
import java.util.concurrent.Future
import com.ibm.crail._
import com.ibm.crail.conf.CrailConfiguration
import com.ibm.crail.memory.OffHeapBuffer
object CrailTest extends Serializable {
@animeshtrivedi
animeshtrivedi / GetUnsafeClass.scala
Created May 9, 2017 16:33 — forked from ramn/GetUnsafeClass.scala
sun.misc.Unsafe use in Scala
def getUnsafeInstance: sun.misc.Unsafe = {
val f = classOf[sun.misc.Unsafe].getDeclaredField("theUnsafe")
f.setAccessible(true)
val unsafe = f.get(null).asInstanceOf[sun.misc.Unsafe]
unsafe
}
@animeshtrivedi
animeshtrivedi / UnsafeCopy.java
Created May 10, 2017 07:38
a sample code to unsafe copy from byte[] to double[] (and vice-versa)
public static void copyTest(String[] args) {
Unsafe us = null;
try {
Field f = Unsafe.class.getDeclaredField("theUnsafe");
f.setAccessible(true);
us = (Unsafe) f.get(null);
System.err.println(" oh my, I have it : " + us);
} catch (Exception e) {
e.printStackTrace();
@animeshtrivedi
animeshtrivedi / SparkCrailFileWriteBroadcast.scala
Last active May 18, 2017 11:16
This file has two functions to write raw 128 byte[] arrays vs. serialized array.
import java.nio.ByteBuffer
import java.util.concurrent.atomic.{AtomicBoolean, AtomicInteger, AtomicLong}
import java.util.concurrent.{ConcurrentHashMap, Future, LinkedBlockingQueue, TimeUnit}
import com.ibm.crail._
import com.ibm.crail.conf.CrailConfiguration
import com.ibm.crail.utils.CrailImmediateOperation
import org.apache.spark._
import org.apache.spark.common._
import org.apache.spark.executor.ShuffleWriteMetrics