-
-
Save Sathiyarajan/8aa2222cc501e24fc4c77d7ca9fd87dc to your computer and use it in GitHub Desktop.
SparkSQL with Cassandra from Killrweather processed data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//If you want to run a local cluster | |
//start-master.sh | |
//start-slave.sh <your-master-url> | |
// connect to spark-shell and load cassandra connector jar OR | |
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --packages datastax:spark-cassandra-connector:1.6.0-s_2.10 | |
// connect to spark-shell to spark cluster and load cassandra connector jar | |
~/Development/spark-1.6.3-bin-hadoop2.6/bin/spark-shell --master <your-master-url> --packages datastax:spark-cassandra-connector:1.6.0-s_2.10 | |
// tell Spark catalogue about cassandara | |
// More info: internally there is a Catalogue in the spark session/spark sql similiar to | |
// hive metastore with entries for databases and tables | |
sqlContext.sql( | |
"""CREATE TEMPORARY TABLE precipitation | |
|USING org.apache.spark.sql.cassandra | |
|OPTIONS ( | |
| keyspace "isd_weather_data", | |
| table "daily_aggregate_precip", | |
| cluster "<your-cluster>", | |
| pushdown "true" | |
|)""".stripMargin) | |
// DSE we automatically generate this inside of internal CassandraHiveMetastore | |
// so when you run with dse the table is automatically located | |
// prepare dataframe from sql | |
val df = sqlContext.sql("select * from precipitation") | |
// run commands and note tab completion | |
df.show | |
df.orderBy("day") | |
df. <hit tab> | |
// etc |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment