Skip to content

Instantly share code, notes, and snippets.

{"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }}
@tmcgrath
tmcgrath / scala spark action examples
Created December 10, 2015 21:13
Spark Console Action functions in Scala
scala> val names1 = sc.parallelize(List("abe", "abby", "apple"))
names1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> names1.reduce((t1,t2) => t1 + t2)
res0: String = abbyappleabe
scala> names1.flatMap(k => List(k.size) ).reduce((t1,t2) => t1 + t2)
res1: Int = 12
scala> val names2 = sc.parallelize(List("apple", "beatty", "beatrice")).map(a => (a, a.size))
@tmcgrath
tmcgrath / Spark Transformation Examples Part 3
Last active January 7, 2016 19:24
Scala based Spark Transformations which require Key, Value pair RDDs
scala> val babyNames = sc.textFile("baby_names.csv")
babyNames: org.apache.spark.rdd.RDD[String] = baby_names.csv MappedRDD[27] at textFile at <console>:12
scala> val rows = babyNames.map(line => line.split(","))
rows: org.apache.spark.rdd.RDD[Array[String]] = MappedRDD[28] at map at <console>:14
scala> val namesToCounties = rows.map(name => (name(1),name(2)))
namesToCounties: org.apache.spark.rdd.RDD[(String, String)] = MappedRDD[29] at map at <console>:16
scala> namesToCounties.groupByKey.collect
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
@tmcgrath
tmcgrath / Spark aggregateByKey
Created December 2, 2014 14:41
Spark aggregateByKey example
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
@Salat
case class CustomerWrapper(_id: Option[Long],
partnerId: Long,
version: Long,
content: List[Customer])
case class Customer(id: Option[Long],
state: Option[CustomerState.Value],
name: String,
website: String,