Skip to content

Instantly share code, notes, and snippets.

@codspire
Last active June 9, 2020 16:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save codspire/ee4a46ec054f962d9ef028b27fcb2635 to your computer and use it in GitHub Desktop.
Save codspire/ee4a46ec054f962d9ef028b27fcb2635 to your computer and use it in GitHub Desktop.
Running Standalone Spark, PySpark on EC2

Running Standalone Spark, PySpark on EC2

Install JDK 8 on EC2

https://gist.github.com/rtfpessoa/17752cbf7156bdf32c59

Install Python 3

https://gist.github.com/codspire/41dd399912fdafbefcd2f2eb76022363

Install Spark

https://sparkour.urizone.net/recipes/installing-ec2/

Configure Environment Variables

~\.bashrc

# .bashrc
export PYSPARK_PYTHON=python35
export SPARK_HOME=/opt/spark
export JAVA_HOME=/usr/java/default

export PATH=/usr/local/bin:$SPARK_HOME/bin:$PATH
alias python=python35

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

Validate the setup

source ~/.bashrc

python

spark-shell

pyspark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment