Skip to content

Instantly share code, notes, and snippets.


Imran Rashid squito

View GitHub Profile
squito /
Last active Jun 24, 2019
Interrupts, joins, OOMs, and uncaught exception handlers
import java.util.ArrayList;
public class Test implements Runnable {
public static class OOMer implements Runnable {
public void run() {
System.out.println("Starting oomer");
ArrayList<byte[]> stuff = new ArrayList<>();
while (true) {
stuff.add(new byte[100000000]);
View shuffle_corrupt_test.scala
// run with "--conf spark.cleaner.referenceTracking=false"
// spin up our full set of executors
sc.parallelize(1 to 100, 100).map { x => Thread.sleep(1000); x}.collect()
def getLocalDirs(): Array[String] = {
val clz = Class.forName("org.apache.spark.util.Utils")
val conf = org.apache.spark.SparkEnv.get.conf
val method = clz.getMethod("getConfiguredLocalDirs", conf.getClass())
method.invoke(null, conf).asInstanceOf[Array[String]]
squito / SlowIterator.scala
Last active Sep 14, 2018
View SlowIterator.scala
// This is an example iterator that runs slowly, to demonstrate how SlowLoggingIterator works
// it just iterates over a range of ints, but puts in occassional delays, to simulate an iterator that is
// actually doing something more complex, eg. fetching records from a DB which is occassionaly slow.
class SlowIterator(start: Int, end: Int, delay: Long, every: Int) extends java.util.Iterator[Integer] {
val underlying = (start until end).toIterator
def hasNext(): Boolean = underlying.hasNext
def next(): Integer = {
squito / tl.out
Last active May 1, 2018
View tl.out
creating a new thread pool thread 0 : tl = null
creating a new thread pool thread 1 : tl = null
creating a new thread pool thread 2 : tl = null
creating a new thread pool thread 3 : tl = null
creating a new thread pool thread 4 : tl = null
creating a new thread pool thread 5 : tl = null
creating a new thread pool thread 6 : tl = null
creating a new thread pool thread 7 : tl = null
creating a new thread pool thread 8 : tl = null
creating a new thread pool thread 9 : tl = null
View gist:de73fbd0b9c00961377068b91283e04c
# filters out "apachespark" choice now
# notice what happens with
# * bad input ("floop")
# * real user that can't be assigned a jira ("fakeimran")
# * selection from list ("imran")
# * arbitrary user that can be assigned ("vanzin")
In [1]: from merge_spark_pr import *
squito / gist:ccd56fefefe4dfef808dc21196a89385
Created Aug 28, 2017
random example of exploring spark internals w/ reflection while debugging cluster config
View gist:ccd56fefefe4dfef808dc21196a89385
// paste in
val xCat = spark.sessionState.catalog.externalCatalog
val catClient = get(xCat, "client")
catClient.reflectMethod("getConf", Seq("hive.metastore.uris", ""))
import org.apache.hadoop.fs.{FileSystem, Path}
val fs = FileSystem.get(sc.hadoopConfiguration)
squito / LA_output.txt
Created Aug 24, 2017
Java timestamp mechanics
View LA_output.txt
> scala -Duser.timezone=America/Los_Angeles timestamp.scala
Defaul TZ: America/Los_Angeles
hours in UTC: 8
TZ offset in hours: -8
squito /
Last active Sep 11, 2017
example of how postgres treats different timestamp types, both parsing and as the timezone is changed
squito /
Last active Feb 5, 2020
spark sql timestamp semantics, and how they changed from 2.0.0 to 2.0.1 (see query_output_2_0_0.txt vs query_output_2_0_1.txt) changed by SPARK-16216

Spark "Timestamp" Behavior

Reading data in different timezones

Note that the ansi sql standard defines "timestamp" as equivalent to "timestamp without time zone". However Spark's behavior depends on both the version of spark and the file format

format \ spark version <= 2.0.0 >= 2.0.1
squito / can_build_from_puzzler.scala
Last active Oct 6, 2016
View can_build_from_puzzler.scala
val dictionary = Map(
"a" -> Set("apple", "ant"),
"b" -> Set("banana", "barn")
// lets count how many times each letter occurs in all words in our dictionary
val letters = dictionary.values.flatMap {x => x.flatMap {_.toCharArray} }
val letterCounts = letters.groupBy(identity).mapValues(_.size)
You can’t perform that action at this time.