Skip to content

Instantly share code, notes, and snippets.

View JoshRosen's full-sized avatar

Josh Rosen JoshRosen

View GitHub Profile

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee

@JoshRosen
JoshRosen / apply-patch.sh
Created June 24, 2016 23:10 — forked from kfish/apply-patch.sh
Apply a patch file that was produced with "git format-patch" using the patch command, and commit it using the message from the original commit.
#!/bin/bash
apply () {
filename=$1
shift
patch_args=$*
gotSubject=no
msg=""

Flask-SQLAlchemy Caching

The following gist is an extract of the article Flask-SQLAlchemy Caching. It allows automated simple cache query and invalidation of cache relations through event among other features.

Usage

retrieve one object

# pulling one User object

user = User.query.get(1)

@JoshRosen
JoshRosen / Jinja module loader.md
Created November 29, 2015 22:37 — forked from voscausa/Jinja module loader.md
Jinja2 compiled templates module loader for App Engine Pyhton 2.7.

Jinja compiled templates module loader

This code is part of a Jinja CMS for Google App Engine Python 2.7 and NDB datastore

A Jinja enviroment is created for every CMS site: site_key_id = 'example

The modules are created using compiler.py The resulting code objects are stored in the dadastore using Kind Runtimes and a BlobProperty

The modules can also be saved / downloaded as .pyc in a zip archive: -compiled-templates.zip

package org.apache.spark.sql.catalyst.expressions.codegen
import org.codehaus.janino.SimpleCompiler
object CodeGenBenchmark {
def quasiquotes(): Unit = {
import scala.reflect.runtime.{universe => ru}
import scala.reflect.runtime.universe._
@JoshRosen
JoshRosen / README.md
Last active December 22, 2015 19:29 — forked from mbostock/.block

Design Description

This visualization plots the dates of all executions performed in Texas since 1982. The cells representing individual days are colored according to the executed prisoner's race.

This visualization is inspired by GitHub's contributions calendar and its code was adapted from Mike Bostock's D3 Calendar example. I performed minor layout post-processing in Adobe Illustrator.

Each year is represented by a grid of 365 squares, one per day. Each grid column represents a week; from top to bottom, the rows correspond to Sunday through Saturday. The dark cell borders divide the year into twelve months. To aid comparisons over time, the years are presented in groups of 10; reading vertically, we can examine year-by-year changes, and we can jump forward a decade at a time by reading horizontally.

This will be a copy/paste doc for installing redis, elasticsearch and logstash on ubuntu 12.04

Pre-Requisites

apt-get update
apt-get upgrade
apt-get install tcl8.5 tcl8.5-dev build-essential rubygems git \
htop python-dev openjdk-7-jre-headless libcurl4-openssl-dev \
bison ctags flex gperf libevent-dev libpcre3-dev libssl-dev libreadline6-dev \
libtokyocabinet-dev libncursesw5-dev libxml2-dev libxslt1-dev libsqlite3-dev \