Skip to content

Instantly share code, notes, and snippets.

@daniel-vera-g
Last active December 7, 2023 17:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save daniel-vera-g/2c3deb6f7c0574698ac5c32a4d9913ca to your computer and use it in GitHub Desktop.
Save daniel-vera-g/2c3deb6f7c0574698ac5c32a4d9913ca to your computer and use it in GitHub Desktop.

Install Spark on Mac

Install Homebrew(Package manager):

  • /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  • brew upgrade && brew update

If not installed, install:

  1. Java:
    • Check: java -version
    • echo 'export PATH="/opt/homebrew/opt/openjdk/bin:$PATH"' >> ~/.zshrc
    • Install:. brew install java
  2. Python:
    • Check: python3 --version
    • Install: $(brew --prefix python)/libexec/bin
  3. Scala:
    • Check: scala -help
    • Install: brew install scala
  4. Apache spark:
    • Check: spark-shell
    • Install: brew install apache-spark
    • Set SPARK_HOME environment variable:
      • echo 'export SPARK_HOME="/opt/homebrew/Cellar/apache-spark/3.5.0/libexec/"' >> ~/.zshrc
      • source ~/.zshrc

Python setup:

  1. Create virtual environment: python3 -m venv $PWD
  2. Activate virtual environment: source ./bin/activate
  3. Install packages:
    • pip3 install notebook
    • pip3 install findspark
  4. Start jupyter notebook: jupyter notebook
  5. Test in jupyter:
import findspark
findspark.init()
import pyspark # only run after findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df = spark.sql('''select 'spark' as hello ''')
df.show()
  1. When done, deactivate virtual environment: deactivate

Note: Use pip3 and python3 instead of pip and python

References:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment