Skip to content

Instantly share code, notes, and snippets.

View drbridgewater's full-sized avatar

Jesse Bridgewater drbridgewater

View GitHub Profile
@drbridgewater
drbridgewater / filterjoin.scala
Created March 22, 2014 04:59
filter joins in scoobi are amazing
// deep joins are easy using scoobi.
// for instance if you have a function that computes your most frequent queries in hadoop and you want to
// return only the search session objects (which could have tons of event data other than searches)
// that contain at least one of those queries you can do it like this.
val sessions_w_pop_queries = listOfPopQueries.join(sessions).filter {
case (pop_queries, session) => session.justSearchPages.flatMap { _.keyword }.toSet.intersect(pop_queries).nonEmpty
}.map { case (pop_queries, session) => session }
@drbridgewater
drbridgewater / Reservoir Sampling
Created February 25, 2014 20:47
Simple R implementation of Reservoir Sampling
K <- 500
N <- 10000
# sort the normal random variable this help to see clearly that stream
# sampling does not care which observations come first
data <- sort(rnorm(N))
res <- numeric(0)
for (i in 1:K) {
res <- c(res, data[i])
@drbridgewater
drbridgewater / R packages
Last active December 16, 2015 07:28
R packages that I use.
pkgs <- c(
"data.table",
"ggplot2",
"scales",
"googleVis",
"igraph",
"gplots",
"ROCR",
"knitr",
@drbridgewater
drbridgewater / for.scala
Created December 16, 2011 22:46
for in Scala
for(x <- 1 to 2 ; y <- 3 to 4 ; z<- 5 to 6 ) println(x,y,z)
(1,3,5)
(1,3,6)
(1,4,5)
(1,4,6)
(2,3,5)
(2,3,6)
(2,4,5)
(2,4,6)