Skip to content

Instantly share code, notes, and snippets.

@rjurney
Created November 9, 2021 21:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rjurney/5240d44fb82753f83fcaf5dfdc96da67 to your computer and use it in GitHub Desktop.
Save rjurney/5240d44fb82753f83fcaf5dfdc96da67 to your computer and use it in GitHub Desktop.
How to load a Google Sheet in PySpark
from pyspark.sql import SparkSession
# Load the package https://github.com/potix2/spark-google-spreadsheets via Maven
spark = (
SparkSession.builder
.appName("Testing Spark Google Sheets")
.config("spark.jars.packages", "com.github.potix2:spark-google-spreadsheets_2.11:0.6.3")
.getOrCreate()
)
# Before this you must create an API key and a Service Account P12 key.
# See: https://github.com/juampynr/google-spreadsheet-reader
df = (
spark.read.format("com.github.potix2.spark.google.spreadsheets")
.option("serviceAccountId", "xxxxxx@developer.gserviceaccount.com").
.option("credentialPath", "/path/to/cedential.p12")
.load("<spreadsheetId>/worksheet1")
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment