By Anthony Vilarim Caliani
- You must have Java installed
- You must have
$JAVA_HOME
environment variable configured - Download Apache Spark
- Extract the downloaded Spark
.zip
or.tar
file wherever you prefer
- Extract the downloaded Spark
Now let's do this!
# First, we need to configure some environment variables.
# Edit ".bashrc", ".bash_profile" or ".zshrc" file.
vim ~/.bashrc
# ATTENTION!
# My Spark Home is "/opt/spark" but it actually depends on
# where you extracted spark downloaded file.
# Now we are going to add some stuff \o/
# -------------------------------------------------------------
# Spark
export SPARK_HOME="/opt/spark"
export PATH="$SPARK_HOME/bin:$PATH"
# The next one was required for version >= 3.0.0
export SPARK_LOCAL_IP="127.0.0.1"
# PySpark
export PATH="$SPARK_HOME/python:$PATH"
export PYSPARK_PYTHON=python3
# ---------------------------- :wq ----------------------------
# $PYSPARK_PYTHON variable is OPTIONAL and it defines which
# Python version PySpark is going to use.
# If you don't set up this variable PySpark is going to use
# your machine's default Python version.
# Now restart your terminal to get a new session or type
source ~/.bashrc
# Open Spark Shell or...
spark-shell
# PySpark and be happy :)
pyspark
# THE END