Skip to content

Instantly share code, notes, and snippets.

View JoshRosen's full-sized avatar

Josh Rosen JoshRosen

View GitHub Profile
@JoshRosen
JoshRosen / jacoco-coverage.patch
Created January 10, 2013 00:01
Code coverage for Spark tests using JaCoCo
$ sbt/sbt "jacoco:cover"
$ open core/target/scala-2.9.2/jacoco/html/index.html
/*** SimpleJob.java ***/
import spark.api.java.*;
import spark.api.java.function.*;
import scala.Tuple2;
public class SimpleJob {
public static void main(String[] args) {
import spark.SparkContext
import SparkContext._
object SimpleJob extends App {
val master = "spark://ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com:7077",
val sc = new SparkContext(master, "Simple Job", "/root/spark", List("target/scala-2.9.2/simple-project_2.9.2-1.0.jar"))
val logFile = "/usr/share/dict/words"
val lines = sc.textFile(logFile).cache
println("COUNT: %d".format(lines.count))

This will be a copy/paste doc for installing redis, elasticsearch and logstash on ubuntu 12.04

Pre-Requisites

apt-get update
apt-get upgrade
apt-get install tcl8.5 tcl8.5-dev build-essential rubygems git \
htop python-dev openjdk-7-jre-headless libcurl4-openssl-dev \
bison ctags flex gperf libevent-dev libpcre3-dev libssl-dev libreadline6-dev \
libtokyocabinet-dev libncursesw5-dev libxml2-dev libxslt1-dev libsqlite3-dev \
@JoshRosen
JoshRosen / scala.rb
Last active April 24, 2016 18:08
Homebrew formula for installing Scala 2.9.3
require 'formula'
class ScalaDocs < Formula
homepage 'http://www.scala-lang.org/'
url 'http://www.scala-lang.org/downloads/distrib/files/scala-docs-2.9.3.zip'
sha1 '633a31ca2eb87ce5b31b4f963bdfd1d4157282ad'
end
class ScalaCompletion < Formula
homepage 'http://www.scala-lang.org/'
@JoshRosen
JoshRosen / gist:6026985
Created July 18, 2013 05:57
Methods missing from the Java API in Spark 0.7.3. This list may contain a few false-positives due to the automated script for finding the missing methods.
Missing RDD methods
spark.api.java.JavaRDD<T> filter(spark.api.java.function.Function<T, java.lang.Object>)
spark.api.java.JavaPairRDD<T, U> zip(spark.api.java.JavaRDD<U>)
void foreachPartition(spark.api.java.function.VoidFunction<java.util.Iterator<T>>)
void foreachWith(spark.api.java.function.Function<java.lang.Object, A>, spark.api.java.function.Function2<T, A, scala.runtime.BoxedUnit>)
spark.api.java.JavaRDD<U> mapPartitions(spark.api.java.function.FlatMapFunction<java.util.Iterator<T>, U>, boolean)
java.lang.Object take(int)
spark.partial.PartialResult<spark.partial.BoundedDouble> countApprox(long, java.lang.Double)
java.lang.Object collect()
spark.api.java.JavaRDD<U> flatMapWith(spark.api.java.function.Function<java.lang.Object, A>, boolean, spark.api.java.function.Function2<T, A, java.util.List<U>>)
@JoshRosen
JoshRosen / README.md
Last active December 22, 2015 19:29 — forked from mbostock/.block

Design Description

This visualization plots the dates of all executions performed in Texas since 1982. The cells representing individual days are colored according to the executed prisoner's race.

This visualization is inspired by GitHub's contributions calendar and its code was adapted from Mike Bostock's D3 Calendar example. I performed minor layout post-processing in Adobe Illustrator.

Each year is represented by a grid of 365 squares, one per day. Each grid column represents a week; from top to bottom, the rows correspond to Sunday through Saturday. The dark cell borders divide the year into twelve months. To aid comparisons over time, the years are presented in groups of 10; reading vertically, we can examine year-by-year changes, and we can jump forward a decade at a time by reading horizontally.

@JoshRosen
JoshRosen / SF_PyData_Meetup_October_2013.ipynb
Last active December 24, 2015 20:19
IPython notebook for my PySpark demo at the San Francisco PyData Meetup: http://meetup.com/San-Francisco-PyData/events/142107482.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@JoshRosen
JoshRosen / index.html
Last active December 25, 2015 17:19
Visualization Assignment 3
<!DOCTYPE html>
<html>
<head>
<title>Time Searcher in D3</title>
<meta charset="utf-8">
<script src="http://d3js.org/d3.v3.min.js" charset="utf-8"></script>
<style>
body {
@JoshRosen
JoshRosen / gist:7668400
Created November 26, 2013 23:49
decaf's imports, generated with snakefood (http://furius.ca/snakefood/): `sfood-imports ./decaf | cut -d' ' -f2 | sort -u`
bisect
collections.defaultdict
copy
cPickle
cProfile
cStringIO
ctypes
datetime
decaf.base
decaf.base.SplitLayer