Skip to content

Instantly share code, notes, and snippets.

@breinero-zz
Last active April 21, 2016 16:19
Show Gist options
  • Save breinero-zz/ebe0d1b4b4560cb1c0575001247e0c6f to your computer and use it in GitHub Desktop.
Save breinero-zz/ebe0d1b4b4560cb1c0575001247e0c6f to your computer and use it in GitHub Desktop.
import com.mongodb.spark._
import org.bson.Document
import com.mongodb.spark.config._
import org.apache.spark.sql.SQLContext
import com.mongodb.spark.sql._
// load the first dataframe "EVAs"
val evadf = sqlContext.read.mongo()
evadf.printSchema()
evadf.registerTempTable("evas")
// load the 2nd dataframe "astronautHours"
// reconfigure spark context to read from astronautHours
val astronautDF = sqlContext.read.option("collection", "astronautHours").mongo()
astronautDF.printSchema()
astronautDF.registerTempTable("astronautTotals")
sqlContext.sql("SELECT astronautTotals._id, astronautTotals.minutes, evas.Vehicle, evas.Duration FROM astronautTotals JOIN evas ON astronautTotals._id LIKE evas.Crew" ).show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment