Skip to content

Instantly share code, notes, and snippets.

@nickmbailey
Created October 17, 2014 17:05
Show Gist options
  • Save nickmbailey/aa15b923cb3b2f03b3d9 to your computer and use it in GitHub Desktop.
Save nickmbailey/aa15b923cb3b2f03b3d9 to your computer and use it in GitHub Desktop.
Spark Demo at Austin Cassandra Users meetup
Basics
scala> val daily = sc.cassandraTable("weathercql", "daily")
scala> daily.first()
scala> daily.cache()
scala> daily.first()
scala> daily.first()
Explore the data (what possible metrics do I have?)
scala> daily.map(row => row.getString("metric")).distinct().collect()
Get some info about temperature
scala> val daily_temps = daily.filter(row => row.getString("metric") == "temperature")
scala> daily_temps.first()
We can be more efficient
scala> val daily_temps = sc.cassandraTable("weathercql", "daily").where("metric = ?", "temperature")
scala> daily_temps.first()
scala> daily_temps.cache()
Max temp across any station
scala> val max_daily_temps = daily_temps.map(row => row.getInt("max"))
scala> import scala.math._
scala> max_daily_temps.reduce((temp1, temp2) => max(temp1, temp2))
What about max temp per station?
scala> val max_daily_temps_by_station = daily_temps.map(row => (row.getString("stationid"), row.getInt("max")))
scala> max_daily_temps_by_station.first()
scala> val max_temp_by_station = max_daily_temps_by_station.reduceByKey((temp1, temp2) => max(temp1, temp2))
scala> max_temp_by_station.collectAsMap()
Output back to Cassandra
cqlsh:weathercql> create table max_temps (stationid text PRIMARY KEY , temp int) ;
scala> max_temp_by_station.saveToCassandra("weathercql", "max_temps", Seq("stationid", "temp"));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment