Skip to content

Instantly share code, notes, and snippets.

Last active May 25, 2022 14:55
Show Gist options
  • Save datlife/9d36a2832b358e81daf1d3f6bcb768af to your computer and use it in GitHub Desktop.
Save datlife/9d36a2832b358e81daf1d3f6bcb768af to your computer and use it in GitHub Desktop.
Setup Apache Spark/ Jupyter Notebook on MacOS

Spark Setup MacOS Guide



  • Install Java 8+, then add to PATH and JAVA_HOME

1. Install Java.

  • Download and install Java 8 through brew.
brew cask install java8
  • Validate Java version
brew cask info java8
java8: 1.8.0_192-b12,750e1c8617c5452694857ad95c3ee230
/usr/local/Caskroom/java8/1.8.0_192-b12,750e1c8617c5452694857ad95c3ee230 (246.2MB)
==> Name
Java Standard Edition Development Kit
==> Artifacts
JDK 8 Update 192.pkg (Pkg)
  • Add java environment variables to terminal

    • Open Vim
    # Depending on terminal one might be using
    vim ~/.zshrc
    • Add the following to .zshrc
    # For Apache Spark
    if which java > /dev/null; then export JAVA_HOME=$(/usr/libexec/java_home); fi

2. Install Apache Spark

Brew is Mac OS Package Manager, similar to apt (

brew update 
brew install scala
brew install apache-spark

3. Setup Variables

  • Assump Current spark version == 2.4.0.
# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
  export SPARK_HOME="/usr/local/Cellar/apache-spark/2.4.0/libexec/"
  • Up to this point, you should be able to launch pyspark and scala-shell from terminal.

4. Integrate Spark and Jupyter Notebook

  • Install Python Env through pyenv, a python versioning manager.
pyenv install 3.6.7 

# Set Python 3.6.7 as main python interpreter
pyenv global 3.6.7

# Update new python
source ~/.zshrc

# Update pip from 10.01 to 18.1
pip install --upgrade pip
  • (Optional) If you received pyspark 2.4.0 requires py4j==0.10.7, which is not installed., fixed by:
pip install py4j==0.10.7
  • Install Jupyter, Apache Toree
pip install jupyter toree
  • Create a kernel in Jupyter for managing Spark
jupyter kernelspec list
# Available kernels:
#  python3   /Users/dat/.pyenv/versions/3.6.7/share/jupyter/kernels/python3
jupyter toree install --replace --spark_home=$SPARK_HOME
jupyter kernelspec list

# Available kernels:
#  apache_toree_scala    /Users/dat/Library/Jupyter/kernels/apache_toree_scala
#  python3               /Users/dat/.pyenv/versions/3.6.7/share/jupyter/kernels/python3

Launch Jupyter Notebook and Test Our First Spark Application

jupyter notebook
  • Remember to Select Toree as your main kernel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment