Created
October 17, 2014 17:05
-
-
Save nickmbailey/aa15b923cb3b2f03b3d9 to your computer and use it in GitHub Desktop.
Spark Demo at Austin Cassandra Users meetup
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Basics | |
scala> val daily = sc.cassandraTable("weathercql", "daily") | |
scala> daily.first() | |
scala> daily.cache() | |
scala> daily.first() | |
scala> daily.first() | |
Explore the data (what possible metrics do I have?) | |
scala> daily.map(row => row.getString("metric")).distinct().collect() | |
Get some info about temperature | |
scala> val daily_temps = daily.filter(row => row.getString("metric") == "temperature") | |
scala> daily_temps.first() | |
We can be more efficient | |
scala> val daily_temps = sc.cassandraTable("weathercql", "daily").where("metric = ?", "temperature") | |
scala> daily_temps.first() | |
scala> daily_temps.cache() | |
Max temp across any station | |
scala> val max_daily_temps = daily_temps.map(row => row.getInt("max")) | |
scala> import scala.math._ | |
scala> max_daily_temps.reduce((temp1, temp2) => max(temp1, temp2)) | |
What about max temp per station? | |
scala> val max_daily_temps_by_station = daily_temps.map(row => (row.getString("stationid"), row.getInt("max"))) | |
scala> max_daily_temps_by_station.first() | |
scala> val max_temp_by_station = max_daily_temps_by_station.reduceByKey((temp1, temp2) => max(temp1, temp2)) | |
scala> max_temp_by_station.collectAsMap() | |
Output back to Cassandra | |
cqlsh:weathercql> create table max_temps (stationid text PRIMARY KEY , temp int) ; | |
scala> max_temp_by_station.saveToCassandra("weathercql", "max_temps", Seq("stationid", "temp")); | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment