Skip to content

Instantly share code, notes, and snippets.

@revolutionisme
Last active January 28, 2021 08:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save revolutionisme/4831ba70760efb878d7c8f0463e7e45f to your computer and use it in GitHub Desktop.
Save revolutionisme/4831ba70760efb878d7c8f0463e7e45f to your computer and use it in GitHub Desktop.
Pyspark settings to read various kinds of data from different sources from your local setup
from pyspark.sql import SparkSession
# 1. Get the hadoop version used by your spark installation along with the spark version
spark = SparkSession.builder.master("local").getOrCreate()
print(f"Hadoop version: {spark._jvm.org.apache.hadoop.util.VersionInfo.getVersion()}")
print(f"Spark Version: {spark.version}")
# 2. Reading data from a a public S3 bucket without configuring AWS credentials, package could've been set with the pyspark job run
spark = (
SparkSession.builder.master("local[*]")
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.0") # Use hadoop version which you can find using 1.
.config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
.getOrCreate()
)
# 3. Reading avro data with the needed package
spark = (
SparkSession.builder.master("local[*]")
.config("spark.jars.packages", "org.apache.spark:spark-avro_2.12:3.0.1") # Use spark version which you can find using 1.
.getOrCreate()
)
# 4. If you get one of the following errors:
# - Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.
# - Can't assign requested address: Service 'sparkDriver' failed after 16 retries!
# Try to assign your localhost to host and bindAddress as shown below
spark = (
SparkSession.builder
.master("local[*]")
.config("spark.driver.host", "127.0.0.1")
.config("spark.driver.bindAddress", "127.0.0.1")
.getOrCreate()
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment