Skip to content

Instantly share code, notes, and snippets.

Todd McGrath tmcgrath

View GitHub Profile
View gist:9863f6457bdc7d6066dc2bc55eb84e60
# Based on
# https://natelandau.com/my-mac-osx-bash_profile/
# ---------------------------------------------------------------------------
#
# Description: This file holds all my BASH configurations and aliases
#
# Sections:
# 1. Environment Configuration
# 2. Make Terminal Better (remapping defaults and adding functionality)
# 3. File and Folder Management
@tmcgrath
tmcgrath / SparkSQL with json example
Created Dec 8, 2016
SparkSQL with json file example
View SparkSQL with json example
// download sample json
http://bit.ly/2gY39Ay
// start spark-shell in same directory as where customers.json
// is downloaded
val customers = sqlContext.jsonFile("customers.json")
// register a temp table
customers.registerTempTable("customers")
@tmcgrath
tmcgrath / Cassandra Spark SQL
Created Dec 8, 2016
SparkSQL with Cassandra from Killrweather processed data
View Cassandra Spark SQL
//If you want to run a local cluster
//start-master.sh
//start-slave.sh <your-master-url>
// connect to spark-shell and load cassandra connector jar OR
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// connect to spark-shell to spark cluster and load cassandra connector jar
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
View Cassandra Join
//If you want to run a local cluster
//start-master.sh
//start-slave.sh <your-master-url>
// connect to spark-shell and load cassandra connector jar OR
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// connect to spark-shell to spark cluster and load cassandra connector jar
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
@tmcgrath
tmcgrath / 0_reuse_code.js
Created Jun 1, 2016
Here are some things you can do with Gists in GistBox.
View 0_reuse_code.js
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
@tmcgrath
tmcgrath / Spark SQL with Scala using mySQL (JDBC) data source
Created Jan 6, 2016
Using Spark Console, connect and query a mySQL database. This is applicable to any database with JDBC driver though
View Spark SQL with Scala using mySQL (JDBC) data source
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell --jars mysql-connector-java-5.1.38-bin.jar
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_66)
Type in expressions to have them evaluated.
@tmcgrath
tmcgrath / Spark SQL with valid JSON input source
Created Jan 6, 2016
Spark SQL with Scala using valid JSON input source example in Spark Console
View Spark SQL with valid JSON input source
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell
2016-01-06 11:05:57.362 java[30505:1203] Unable to load realm info from SCDynamicStore
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
@tmcgrath
tmcgrath / Spark SQL with invalid JSON input source
Last active Jan 6, 2016
Happy Path Spark SQL with JSON input source
View Spark SQL with invalid JSON input source
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell
2016-01-06 10:54:58.540 java[25147:1203] Unable to load realm info from SCDynamicStore
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.1
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_65)
@tmcgrath
tmcgrath / Spark SQL CSV repl session
Created Jan 6, 2016
Spark SQL with Scala using CSV input data source in spark console
View Spark SQL CSV repl session
todd-mcgraths-macbook-pro:spark-1.4.1-bin-hadoop2.4 toddmcgrath$ bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0
Ivy Default Cache set to: /Users/toddmcgrath/.ivy2/cache
The jars for the packages stored in: /Users/toddmcgrath/.ivy2/jars
:: loading settings :: url = jar:file:/Users/toddmcgrath/Development/spark-1.4.1-bin-hadoop2.4/lib/spark-assembly-1.4.1-hadoop2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-csv_2.10;1.3.0 in central
found org.apache.commons#commons-csv;1.1 in central
found com.univocity#univocity-parsers;1.5.1 in central
View customers example json
{"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }}
You can’t perform that action at this time.