Skip to content

Instantly share code, notes, and snippets.

@samklr
Created November 15, 2018 16:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samklr/af8b56bdf58c05f2d058765583f2e5e2 to your computer and use it in GitHub Desktop.
Save samklr/af8b56bdf58c05f2d058765583f2e5e2 to your computer and use it in GitHub Desktop.
Integrate Spark with Hive
import org.apache.spark.sql.{SaveMode, SparkSession}
case class HelloWorld(message: String)
def main(args: Array[String]): Unit = {
// Creation of SparkSession
val sparkSession = SparkSession.builder()
.appName("example-spark-scala-read-and-write-from-hive")
.config("hive.metastore.warehouse.dir", params.hiveHost + "user/hive/warehouse")
.enableHiveSupport()
.getOrCreate()
// ====== Creating a dataframe with 1 partition
import sparkSession.implicits._
val df = Seq(HelloWorld("helloworld")).toDF().coalesce(1)
// ======= Writing files
// Writing Dataframe as a Hive table
import sparkSession.sql
sql("DROP TABLE IF EXISTS helloworld")
sql("CREATE TABLE helloworld (message STRING)")
df.write.mode(SaveMode.Overwrite).saveAsTable("helloworld")
logger.info("Writing hive table : OK")
// ======= Reading files
// Reading hive table into a Spark Dataframe
val dfHive = sql("SELECT * from helloworld")
logger.info("Reading hive table : OK")
logger.info(dfHive.show())
case None =>
println("Bblabla")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment