Skip to content

Instantly share code, notes, and snippets.

@rileypeterson
Last active December 10, 2019 21:58
Show Gist options
  • Save rileypeterson/99eaaab2a0ce676292089b4fe5310832 to your computer and use it in GitHub Desktop.
Save rileypeterson/99eaaab2a0ce676292089b4fe5310832 to your computer and use it in GitHub Desktop.
Ubuntu/vagrant pyspark download
wget http://ftp.wayne.edu/apache/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
# or another mirror from https://www.apache.org/dyn/closer.lua/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz
tar -xzf spark-2.4.4-bin-hadoop2.7.tgz
export SPARK_HOME=/vagrant/spark-2.4.4-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
# Get Java8
sudo apt-get install default-jre
export JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre/"
#Use python3
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_PYTHON=ipython
wget https://s3.amazonaws.com/redshift-downloads/drivers/jdbc/1.2.32.1056/RedshiftJDBC42-no-awssdk-1.2.32.1056.jar
mv RedshiftJDBC42-no-awssdk-1.2.32.1056.jar spark-2.4.4-bin-hadoop2.7/
cd spark-2.4.4-bin-hadoop2.7
pyspark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment