Skip to content

Instantly share code, notes, and snippets.

View yaravind's full-sized avatar
💭
Constraints Liberate. Liberties Constrain.

Aravind Yarram yaravind

💭
Constraints Liberate. Liberties Constrain.
View GitHub Profile

Hypermedia API design session

Proposed/ran by Andreas Schmidt, Nokia

Based off his design around the Nokia Places API

Notes

  • Picked JSON, no support for XML
  • Added ?accept=application/json to the URL in the browser for a raw response
@bryanhunter
bryanhunter / ndc-oslo-2014.md
Last active April 10, 2016 11:40
NDC Oslo 2014 - FP Cheat Sheet

#The Functional Programmers Cheat Sheet for NDC Oslo 2014

This year NDC Oslo has a full three-day functional programming track with an amazing lineup. If you agree that the future of programming is FP, use this as your "auto pilot" guide on what sessions to attend.

Cheer for sessions on Twitter using the #ndcoslo and #fptrack hashtags.

[The full agenda (including non-fp sessions) is here].

@ashwanthkumar
ashwanthkumar / build.sh
Created February 4, 2016 12:35
Build commands for GoLang project on SnapCI
GITHUB_USERNAME="ashwanthkumar"
GITHUB_REPO="marathonctl"
# Install Golang as part of the build - takes about 30 secs
sudo yum install --assumeyes golang
# Setup GOPATH conventions
mkdir -p /var/snap-ci/src/github.com/${GITHUB_USERNAME}/
# Create symlinks according to go's directory structure
ln -s /var/snap-ci/repo /var/snap-ci/src/github.com/${GITHUB_USERNAME}/${GITHUB_REPO}
# Run your make commands to test and build your project
@kgadek
kgadek / FunSetSuite.scala
Created October 5, 2012 17:11
Scala: bughunting in assignment from Coursera's course
package funsets
import org.scalatest.FunSuite
import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner
/**
* This class is a test suite for the methods in object FunSets. To run
* the test suite, you can either:
@jaceklaskowski
jaceklaskowski / spark-jobserver-docker-macos.md
Last active August 1, 2018 11:28
How to run spark-jobserver on Docker and Mac OS (using docker-machine)
@squito
squito / AccumulatorListener.scala
Last active March 15, 2019 06:34
Accumulator Examples
import scala.collection.mutable.Map
import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
import org.apache.spark.scheduler.{SparkListenerStageCompleted, SparkListener}
import org.apache.spark.SparkContext._
/**
* just print out the values for all accumulators from the stage.
* you will only get updates from *named* accumulators, though
package com.databricks.spark.jira
import scala.io.Source
import org.apache.spark.rdd.RDD
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.sources.{TableScan, BaseRelation, RelationProvider}
@jabley
jabley / convert2csv.sh
Created September 13, 2012 11:05
Scripts to help with converting an Oracle .dmp to CSV
#!/bin/sh
# script to automate the load and export to CSV of an oracle dump
# This script assumes:
# * you have the vagrant published key available locally in your .ssh directory
# * You have the Oracle VirtualBox image running locally
# ** ssh port-forwarding is configured for host port 2022 -> guess port 22.
set -e
@ParthaSSatpathy
ParthaSSatpathy / Introduction to NLP with Python.ipynb
Last active April 11, 2022 23:36
Introduction to Natural Language Processing Using Python
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Generating Flame Graphs for Apache Spark

Flame graphs are a nifty debugging tool to determine where CPU time is being spent. Using the Java Flight recorder, you can do this for Java processes without adding significant runtime overhead.

When are flame graphs useful?

Shivaram Venkataraman and I have found these flame recordings to be useful for diagnosing coarse-grained performance problems. We started using them at the suggestion of Josh Rosen, who quickly made one for the Spark scheduler when we were talking to him about why the scheduler caps out at a throughput of a few thousand tasks per second. Josh generated a graph similar to the one below, which illustrates that a significant amount of time is spent in serialization (if you click in the top right hand corner and search for "serialize", you can see that 78.6% of the sampled CPU time was spent in serialization). We used this insight to spee