Skip to content

Instantly share code, notes, and snippets.

@Salat
case class CustomerWrapper(_id: Option[Long],
partnerId: Long,
version: Long,
content: List[Customer])
case class Customer(id: Option[Long],
state: Option[CustomerState.Value],
name: String,
website: String,
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
@tmcgrath
tmcgrath / Spark SQL CSV repl session
Created January 6, 2016 16:48
Spark SQL with Scala using CSV input data source in spark console
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0
Ivy Default Cache set to: /Users/toddmcgrath/.ivy2/cache
The jars for the packages stored in: /Users/toddmcgrath/.ivy2/jars
:: loading settings :: url = jar:file:/Users/toddmcgrath/Development/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.10;1.3.0 in central
found org.apache.commons#commons-csv;1.1 in central
found com.univocity#univocity-parsers;1.5.1 in central
@tmcgrath
tmcgrath / Spark SQL with invalid JSON input source
Last active January 6, 2016 17:07
Happy Path Spark SQL with JSON input source
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell
2016-01-06 10:54:58.540 java[25147:1203] Unable to load realm info from SCDynamicStore
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
@tmcgrath
tmcgrath / Spark SQL with valid JSON input source
Created January 6, 2016 17:08
Spark SQL with Scala using valid JSON input source example in Spark Console
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell
2016-01-06 11:05:57.362 java[30505:1203] Unable to load realm info from SCDynamicStore
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
@tmcgrath
tmcgrath / Spark Transformation Examples Part 3
Last active January 7, 2016 19:24
Scala based Spark Transformations which require Key, Value pair RDDs
scala> val babyNames = sc.textFile("baby_names.csv")
babyNames: org.apache.spark.rdd.RDD[String] = baby_names.csv MappedRDD[27] at textFile at <console>:12
scala> val rows = babyNames.map(line => line.split(","))
rows: org.apache.spark.rdd.RDD[Array[String]] = MappedRDD[28] at map at <console>:14
scala> val namesToCounties = rows.map(name => (name(1),name(2)))
namesToCounties: org.apache.spark.rdd.RDD[(String, String)] = MappedRDD[29] at map at <console>:16
scala> namesToCounties.groupByKey.collect
@tmcgrath
tmcgrath / scala spark action examples
Created December 10, 2015 21:13
Spark Console Action functions in Scala
scala> val names1 = sc.parallelize(List("abe", "abby", "apple"))
names1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> names1.reduce((t1,t2) => t1 + t2)
res0: String = abbyappleabe
scala> names1.flatMap(k => List(k.size) ).reduce((t1,t2) => t1 + t2)
res1: Int = 12
scala> val names2 = sc.parallelize(List("apple", "beatty", "beatrice")).map(a => (a, a.size))
@tmcgrath
tmcgrath / 0_reuse_code.js
Created June 1, 2016 21:04
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
//If you want to run a local cluster
//start-master.sh
//start-slave.sh <your-master-url>
// connect to spark-shell and load cassandra connector jar OR
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// connect to spark-shell to spark cluster and load cassandra connector jar
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10