Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vvgsrk/089e079ae54686cb935204b13df608ce to your computer and use it in GitHub Desktop.
Save vvgsrk/089e079ae54686cb935204b13df608ce to your computer and use it in GitHub Desktop.
Prerequisites before starting spark-shell on glue development endpoint
# Properties File : Create a properties file with the following configurations and name it as glue_spark_shell.properties
# Note: In below configurations, Replace the s3 access and secret keys with your key's
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.driver.extraClassPath /usr/share/aws/glue/etl/jars/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/hmclient/lib/*:/usr/share/java/Hive-JSON-Serde/*:/usr/share/aws/sagemaker-spark-sdk/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/glue/etl/python/PyGlue.zip:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/hadoop/lib/native/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/glue/etl/conf
spark.executor.extraClassPath /usr/share/aws/glue/etl/jars/*:/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/hmclient/lib/*:/usr/share/java/Hive-JSON-Serde/*:/usr/share/aws/sagemaker-spark-sdk/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/glue/etl/python/PyGlue.zip:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/hadoop/lib/native/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/glue/etl/conf
spark.hadoop.fs.s3a.access.key <your_access_key>
spark.hadoop.fs.s3a.secret.key <your_secret_key>
hive.metastore.client.factory.class com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
# Move the properties file to Glue Dev Endpoint Server.
# The above created file can be used to start the glue-spark-shell (Scala) or gluepyspark (Python) spark shell using following command
# $ glue-spark-shell -v --properties-file /home/glue/glue_spark_shell.properties
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment