Skip to content

Instantly share code, notes, and snippets.

@vvgsrk
Last active March 14, 2019 17:29
Show Gist options
  • Save vvgsrk/45ddb2242b7c5188c6b52bf3b06faf30 to your computer and use it in GitHub Desktop.
Save vvgsrk/45ddb2242b7c5188c6b52bf3b06faf30 to your computer and use it in GitHub Desktop.
Read a table from "Snowflake on AWS" using Spark in Windows PC
package com.vvgsrk.data
import org.apache.spark.sql.SparkSession
import net.snowflake.spark.snowflake.Utils.SNOWFLAKE_SOURCE_NAME
/** This object test "snowflake on AWS" connection using spark
* from Eclipse, Windows PC.
*
* It uses Hadoop 2.7, Spark 2.3.2
*
* @author vvgsrk
*
*/
object TestSnowflakeConnection extends App {
// Frankfurt only have v4 authentication
System.setProperty("com.amazonaws.services.s3.enableV4", "true")
// SparkSession and configurations
val spark = SparkSession.builder()
.master("local[*]")
.appName("TestSnowflakeConnection")
.config("spark.ui.enabled", "false")
.getOrCreate()
// Set AWS S3 configurations
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "YOUR_KEY")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "YOUR_SECRET_KEY")
spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "s3-eu-central-1.amazonaws.com")
spark.sparkContext.hadoopConfiguration.set("spark.hadoop.fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
// Snowflake Settings
val sfOptions = Map(
"sfURL" -> "vvgsrk.eu-central-1.snowflakecomputing.com",
"sfAccount" -> "vvgsrk",
"sfUser" -> "vvgsrk",
"sfPassword" -> "YOUR_PWD",
"sfDatabase" -> "HR_DEV",
"sfSchema" -> "HR",
"sfWarehouse" -> "HR_WH_DEV",
"sfRole" -> "DEVELOPERS",
"region_id"-> "eu-central-1",
"tempdir" -> "s3a://vvgsrk-snowflake-data-temp" // Make sure the tempdir option uses s3a://
)
// Read data as dataframe
val df = spark.read
.format(SNOWFLAKE_SOURCE_NAME)
.options(sfOptions)
.option("dbtable", "EMP")
.load()
df.printSchema()
df.show(false)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment