Skip to content

Instantly share code, notes, and snippets.

@uho
Last active August 29, 2015 13:56
Show Gist options
  • Save uho/9006660 to your computer and use it in GitHub Desktop.
Save uho/9006660 to your computer and use it in GitHub Desktop.
# SparkR [1] version of the Spark quickstart SimplApp example [2]
library(SparkR)
sc <- sparkR.init(master="local")
filter <- function(rdd, pred) { flatMap(rdd, function(x) { if (pred(x)) { list(x) } else { list() } })}
logFile <- "spark-0.9.0-incubating-bin-hadoop2/readme.md"
logData <- textFile(sc, logFile)
numAs <- length(filter(logData, function(s) { grepl("a", s) }))
numBs <- length(filter(logData, function(s) { grepl("b", s) }))
cat(sprintf("Lines with a: %d, lines with b: %d",numAs, numBs))
# [1] SparkR: http://amplab-extras.github.io/SparkR-pkg/
# [2] Spark quickstart SimpleApp example: https://spark.incubator.apache.org/docs/latest/quick-start.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment