Skip to content

Instantly share code, notes, and snippets.

@afranzi
Last active May 16, 2019 09:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save afranzi/9f2c3f292fa3283cfcafa2eb02a44f6d to your computer and use it in GitHub Desktop.
Save afranzi/9f2c3f292fa3283cfcafa2eb02a44f6d to your computer and use it in GitHub Desktop.
Download Spark from archive.

Setup Environmnet variables

export SPARK_VERSION=2.4.0
export SPARK_PACKAGE=spark-${SPARK_VERSION}-bin-hadoop2.7
export SPARK_HOME=$HOME/spark-${SPARK_VERSION}

export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=${HOME}/Projects/notebooks"

Download the desired Spark version.

curl -sL --retry 3 \
  "https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/${SPARK_PACKAGE}.tgz" \
  | gunzip \
  | tar x -C /tmp/ \
  && mv /tmp/$SPARK_PACKAGE $SPARK_HOME

Install PySpark and Jupyter in our virtualEnv.

pip install \
  sparkmagic==0.12.6 \
  prompt-toolkit==1.0.15 \
  pyspark==${SPARK_VERSION} \
  jupyter==1.0.0

By defining notebook-dir, we will be able to store & persist our notebooks in the desired folder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment