Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Set up Apache Spark 1.5+ with Hadoop 2.6+ s3a
# For a local environment
# Install hadoop and apache-spark via homebrew
# Apache Spark conf file
# libexec/conf/spark-defaults.conf
# Make the AWS jars available to Spark
spark.executor.extraClassPath /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar:/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar
spark.driver.extraClassPath /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar:/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar
# Add file
# libexec/conf/hdfs-site.xml
<?xml version="1.0"?>
Copy link

Thanks for this - I struggled with the same thing for over a week until I found this gist. I've tried to leave an even more concise version of what you need on my own gist here:

Copy link

Thanks for this.
pyspark --packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0
also works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment