Skip to content

Instantly share code, notes, and snippets.

{
"name": "local-file-source",
"config": {
"connector.class":"FileStreamSource",
"tasks.max":"1",
"file":"test.txt",
"topic":"connect-test"
}
}
@tmcgrath
tmcgrath / Spark aggregateByKey
Created December 2, 2014 14:41
Spark aggregateByKey example
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
# Based on
# https://natelandau.com/my-mac-osx-bash_profile/
# ---------------------------------------------------------------------------
#
# Description: This file holds all my BASH configurations and aliases
#
# Sections:
# 1. Environment Configuration
# 2. Make Terminal Better (remapping defaults and adding functionality)
# 3. File and Folder Management
{"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }}
@tmcgrath
tmcgrath / Spark SQL with Scala using mySQL (JDBC) data source
Created January 6, 2016 17:32
Using Spark Console, connect and query a mySQL database. This is applicable to any database with JDBC driver though
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell --jars mysql-connector-java-5.1.38-bin.jar
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
@tmcgrath
tmcgrath / Cassandra Spark SQL
Created December 8, 2016 13:35
SparkSQL with Cassandra from Killrweather processed data
//If you want to run a local cluster
//start-master.sh
//start-slave.sh <your-master-url>
// connect to spark-shell and load cassandra connector jar OR
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// connect to spark-shell to spark cluster and load cassandra connector jar
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
@tmcgrath
tmcgrath / SparkSQL with json example
Created December 8, 2016 13:43
SparkSQL with json file example
// download sample json
http://bit.ly/2gY39Ay
// start spark-shell in same directory as where customers.json
// is downloaded
val customers = sqlContext.jsonFile("customers.json")
// register a temp table
customers.registerTempTable("customers")
//If you want to run a local cluster
//start-master.sh
//start-slave.sh <your-master-url>
// connect to spark-shell and load cassandra connector jar OR
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// connect to spark-shell to spark cluster and load cassandra connector jar
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
@tmcgrath
tmcgrath / 0_reuse_code.js
Created June 1, 2016 21:04
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@tmcgrath
tmcgrath / scala spark action examples
Created December 10, 2015 21:13
Spark Console Action functions in Scala
scala> val names1 = sc.parallelize(List("abe", "abby", "apple"))
names1: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> names1.reduce((t1,t2) => t1 + t2)
res0: String = abbyappleabe
scala> names1.flatMap(k => List(k.size) ).reduce((t1,t2) => t1 + t2)
res1: Int = 12
scala> val names2 = sc.parallelize(List("apple", "beatty", "beatrice")).map(a => (a, a.size))