Skip to content

Instantly share code, notes, and snippets.

@Sathiyarajan
Forked from tmcgrath/Cassandra Spark SQL
Created February 2, 2018 10:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Sathiyarajan/8aa2222cc501e24fc4c77d7ca9fd87dc to your computer and use it in GitHub Desktop.
Save Sathiyarajan/8aa2222cc501e24fc4c77d7ca9fd87dc to your computer and use it in GitHub Desktop.
SparkSQL with Cassandra from Killrweather processed data
//If you want to run a local cluster
//start-master.sh
//start-slave.sh <your-master-url>
// connect to spark-shell and load cassandra connector jar OR
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// connect to spark-shell to spark cluster and load cassandra connector jar
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10
// tell Spark catalogue about cassandara
// More info: internally there is a Catalogue in the spark session/spark sql similiar to
// hive metastore with entries for databases and tables
sqlContext.sql(
"""CREATE TEMPORARY TABLE precipitation
|USING org.apache.spark.sql.cassandra
|OPTIONS (
| keyspace "isd_weather_data",
| table "daily_aggregate_precip",
| cluster "<your-cluster>",
| pushdown "true"
|)""".stripMargin)
// DSE we automatically generate this inside of internal CassandraHiveMetastore
// so when you run with dse the table is automatically located
// prepare dataframe from sql
val df = sqlContext.sql("select * from precipitation")
// run commands and note tab completion
df.show
df.orderBy("day")
df. <hit tab>
// etc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment