Skip to content

Instantly share code, notes, and snippets.

@iahmad-khan
Forked from ceteri/log.scala
Created January 18, 2017 13:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iahmad-khan/139cbdfbb1b51cd222c65fd0419ead46 to your computer and use it in GitHub Desktop.
Save iahmad-khan/139cbdfbb1b51cd222c65fd0419ead46 to your computer and use it in GitHub Desktop.
Intro to Apache Spark: code example for RDD animation
// load error messages from a log into memory
// then interactively search for various patterns
// base RDD
val lines = sc.textFile("log.txt")
// transformed RDDs
val errors = lines.filter(_.startsWith("ERROR"))
val messages = errors.map(_.split("\t")).map(r => r(1))
messages.cache()
// actions
messages.filter(_.contains("mysql")).count()
messages.filter(_.contains("php")).count()
val messages = errors.map(_.split("\t")).map(r => r(1))
messages.cache()
messages.filter(_.contains("mysql")).count()
messages.filter(_.contains("php")).count()
ERROR php: dying for unknown reasons
WARN dave, are you angry at me?
ERROR did mysql just barf?
WARN xylons approaching
ERROR mysql cluster: replace with spark cluster
scala> messages.toDebugString
res5: String =
MappedRDD[4] at map at <console>:16 (1 partitions)
MappedRDD[3] at map at <console>:16 (1 partitions)
FilteredRDD[2] at filter at <console>:14 (1 partitions)
MappedRDD[1] at textFile at <console>:12 (1 partitions)
HadoopRDD[0] at textFile at <console>:12 (1 partitions)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment