Skip to content

Instantly share code, notes, and snippets.

@avcaliani
Last active August 15, 2020 01:45
Show Gist options
  • Save avcaliani/aa46d7c4c7cb86b5d6ff3ab171d89859 to your computer and use it in GitHub Desktop.
Save avcaliani/aa46d7c4c7cb86b5d6ff3ab171d89859 to your computer and use it in GitHub Desktop.
#docs #how-to #apache-spark

⭐️ How to install Apache Spark?

By Anthony Vilarim Caliani

# #

Before Installing

  • You must have Java installed
  • You must have $JAVA_HOME environment variable configured
  • Download Apache Spark
    • Extract the downloaded Spark .zip or .tar file wherever you prefer

Now let's do this!

Installing and Configuring Spark and PySpark

# First, we need to configure some environment variables.
# Edit ".bashrc", ".bash_profile" or ".zshrc" file.
vim ~/.bashrc

# ATTENTION!
# My Spark Home is "/opt/spark" but it actually depends on
# where you extracted spark downloaded file.

# Now we are going to add some stuff \o/
# -------------------------------------------------------------

# Spark
export SPARK_HOME="/opt/spark"
export PATH="$SPARK_HOME/bin:$PATH"
# The next one was required for version >= 3.0.0
export SPARK_LOCAL_IP="127.0.0.1"

# PySpark
export PATH="$SPARK_HOME/python:$PATH"
export PYSPARK_PYTHON=python3

# ---------------------------- :wq ----------------------------

# $PYSPARK_PYTHON variable is OPTIONAL and it defines which
# Python version PySpark is going to use.
# If you don't set up this variable PySpark is going to use
# your machine's default Python version.

# Now restart your terminal to get a new session or type
source ~/.bashrc

# Open Spark Shell or...
spark-shell

# PySpark and be happy :)
pyspark

# THE END

Related Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment