Skip to content

Instantly share code, notes, and snippets.

@thekensta
Created October 14, 2015 20:27
Show Gist options
  • Star 10 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save thekensta/21068ef1b6f4af08eb09 to your computer and use it in GitHub Desktop.
Save thekensta/21068ef1b6f4af08eb09 to your computer and use it in GitHub Desktop.
Set up Apache Spark 1.5+ with Hadoop 2.6+ s3a
# For a local environment
# Install hadoop and apache-spark via homebrew
# Apache Spark conf file
# libexec/conf/spark-defaults.conf
# Make the AWS jars available to Spark
spark.executor.extraClassPath /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar:/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar
spark.driver.extraClassPath /usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/aws-java-sdk-1.7.4.jar:/usr/local/Cellar/hadoop/2.7.1/libexec/share/hadoop/tools/lib/hadoop-aws-2.7.1.jar
# Add file
# libexec/conf/hdfs-site.xml
# http://stackoverflow.com/questions/30262567/unable-to-load-aws-credentials-when-using-spark-sql-through-beeline
<?xml version="1.0"?>
<configuration>
<property>
<name>fs.s3a.access.key</name>
<value>xxx</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>xxx</value>
</property>
</configuration>
@jobonilla
Copy link

Thanks for this.
pyspark --packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.6.0
also works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment