Reference: https://spark.apache.org/docs/latest/
- Install Java 8+, then add to
PATH
andJAVA_HOME
- Download and install Java 8 through brew.
brew cask install adoptopenjdk/openjdk/adoptopenjdk8
- Validate Java version
brew cask info adoptopenjdk8
adoptopenjdk8: 8,262:b10
https://adoptopenjdk.net/
/usr/local/Caskroom/adoptopenjdk8/8,262:b10 (100.2MB)
From: https://github.com/adoptopenjdk/homebrew-openjdk/blob/HEAD/Casks/adoptopenjdk8.rb
==> Name
AdoptOpenJDK 8
==> Artifacts
OpenJDK8U-jdk_x64_mac_hotspot_8u262b10.pkg (Pkg)
...
-
Add java environment variables to terminal
- Open Vim
# Depending on terminal one might be using, for zsh, the file does not exist by default, you have to create it. vim ~/.zshrc
- Add the following to
.zshrc
# For Apache Spark if which java > /dev/null; then export JAVA_HOME=$(/usr/libexec/java_home); fi
Brew is Mac OS Package Manager, similar to apt
(http://brew.sh/)
brew update
brew install scala
brew install apache-spark
- Assump Current spark version ==
2.4.0
.
# For a ipython notebook and pyspark integration
if which pyspark > /dev/null; then
export SPARK_HOME="/usr/local/Cellar/apache-spark/2.4.0/libexec/"
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
fi
- Up to this point, you should be able to launch
pyspark
andscala-shell
from terminal.
- Install Python Env through
pyenv
, a python versioning manager.
pyenv install 3.6.7
# Set Python 3.6.7 as main python interpreter
pyenv global 3.6.7
# Update new python
source ~/.zshrc
# Update pip from 10.01 to 18.1
pip install --upgrade pip
- (Optional) If you received
pyspark 2.4.0 requires py4j==0.10.7, which is not installed.
, fixed by:
pip install py4j==0.10.7
- Install Jupyter, Apache Toree
pip install jupyter toree
- Create a
kernel
in Jupyter for managing Spark
jupyter kernelspec list
# Available kernels:
# python3 /Users/dat/.pyenv/versions/3.6.7/share/jupyter/kernels/python3
jupyter toree install --replace --spark_home=$SPARK_HOME
jupyter kernelspec list
# Available kernels:
# apache_toree_scala /Users/dat/Library/Jupyter/kernels/apache_toree_scala
# python3 /Users/dat/.pyenv/versions/3.6.7/share/jupyter/kernels/python3
jupyter notebook
- Remember to Select Toree as your main kernel