Skip to content

Instantly share code, notes, and snippets.

View yaravind's full-sized avatar
💭
Constraints Liberate. Liberties Constrain.

Aravind Yarram yaravind

💭
Constraints Liberate. Liberties Constrain.
View GitHub Profile

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

@jaceklaskowski
jaceklaskowski / spark-jobserver-docker-macos.md
Last active August 1, 2018 11:28
How to run spark-jobserver on Docker and Mac OS (using docker-machine)
@ashwanthkumar
ashwanthkumar / build.sh
Created February 4, 2016 12:35
Build commands for GoLang project on SnapCI
GITHUB_USERNAME="ashwanthkumar"
GITHUB_REPO="marathonctl"
# Install Golang as part of the build - takes about 30 secs
sudo yum install --assumeyes golang
# Setup GOPATH conventions
mkdir -p /var/snap-ci/src/github.com/${GITHUB_USERNAME}/
# Create symlinks according to go's directory structure
ln -s /var/snap-ci/repo /var/snap-ci/src/github.com/${GITHUB_USERNAME}/${GITHUB_REPO}
# Run your make commands to test and build your project
@arturmkrtchyan
arturmkrtchyan / get_job_status.sh
Last active August 7, 2023 18:55
Apache Spark Hidden REST API
curl http://spark-cluster-ip:6066/v1/submissions/status/driver-20151008145126-0000
@chinshr
chinshr / Jenkinsfile
Last active October 16, 2023 09:25
Best of Jenkinsfile, a collection of useful workflow scripts ready to be copied into your Jenkinsfile on a per use basis.
#!groovy
# Best of Jenkinsfile
# `Jenkinsfile` is a groovy script DSL for defining CI/CD workflows for Jenkins
node {
}
@cb372
cb372 / jargon.md
Last active May 8, 2023 16:03
Category theory jargon cheat sheet

Category theory jargon cheat sheet

A primer/refresher on the category theory concepts that most commonly crop up in conversations about Scala or FP. (Because it's embarassing when I forget this stuff!)

I'll be assuming Scalaz imports in code samples, and some of the code may be pseudo-Scala.

Functor

A functor is something that supports map.

@squito
squito / AccumulatorListener.scala
Last active March 15, 2019 06:34
Accumulator Examples
import scala.collection.mutable.Map
import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
import org.apache.spark.scheduler.{SparkListenerStageCompleted, SparkListener}
import org.apache.spark.SparkContext._
/**
* just print out the values for all accumulators from the stage.
* you will only get updates from *named* accumulators, though
@staltz
staltz / introrx.md
Last active May 3, 2024 13:00
The introduction to Reactive Programming you've been missing
@bryanhunter
bryanhunter / ndc-oslo-2014.md
Last active April 10, 2016 11:40
NDC Oslo 2014 - FP Cheat Sheet

#The Functional Programmers Cheat Sheet for NDC Oslo 2014

This year NDC Oslo has a full three-day functional programming track with an amazing lineup. If you agree that the future of programming is FP, use this as your "auto pilot" guide on what sessions to attend.

Cheer for sessions on Twitter using the #ndcoslo and #fptrack hashtags.

[The full agenda (including non-fp sessions) is here].

@Chaser324
Chaser324 / GitHub-Forking.md
Last active May 2, 2024 05:49
GitHub Standard Fork & Pull Request Workflow

Whether you're trying to give back to the open source community or collaborating on your own projects, knowing how to properly fork and generate pull requests is essential. Unfortunately, it's quite easy to make mistakes or not know what you should do when you're initially learning the process. I know that I certainly had considerable initial trouble with it, and I found a lot of the information on GitHub and around the internet to be rather piecemeal and incomplete - part of the process described here, another there, common hangups in a different place, and so on.

In an attempt to coallate this information for myself and others, this short tutorial is what I've found to be fairly standard procedure for creating a fork, doing your work, issuing a pull request, and merging that pull request back into the original project.

Creating a Fork

Just head over to the GitHub page and click the "Fork" button. It's just that simple. Once you've done that, you can use your favorite git client to clone your repo or j