Skip to content

Instantly share code, notes, and snippets.

@pbailis
pbailis / list.md
Last active April 15, 2018 08:54
Quick and dirty (incomplete) list of interesting, mostly recent data warehousing/"big data" papers

A friend asked me for a few pointers to interesting, mostly recent papers on data warehousing and "big data" database systems, with an eye towards real-world deployments. I figured I'd share the list. It's biased and rather incomplete but maybe of interest to someone. While many are obvious choices (I've omitted several, like MapReduce), I think there are a few underappreciated gems.

###Dataflow Engines:

Dryad--general-purpose distributed parallel dataflow engine
http://research.microsoft.com/en-us/projects/dryad/eurosys07.pdf

Spark--in memory dataflow
http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf

require "formula"
class GoWeekly < Formula
url "http://go.googlecode.com/hg/", :revision => "weekly"
head "http://go.googlecode.com/hg/"
version "weekly"
homepage "http://golang.org"
skip_clean "bin"