Skip to content

Instantly share code, notes, and snippets.

@mcmoe
Created August 26, 2019 22:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mcmoe/1eded14f54a1dfcc9d92cb03b6f455ba to your computer and use it in GitHub Desktop.
Save mcmoe/1eded14f54a1dfcc9d92cb03b6f455ba to your computer and use it in GitHub Desktop.
How to set up pyspark and jupyter on aws ec2 instance
# Originally based on https://raw.githubusercontent.com/pzfreo/ox-clo/master/code/flintrock-jupyter.sh
sudo yum install gcc gcc-c++ -y
# sudo yum install python27-pip -y
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
#sudo pip-2.7 install jupyter
sudo pip2.7 install jupyter
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser'
pyspark --master spark://0.0.0.0:7077 \
--packages org.apache.hadoop:hadoop-aws:2.7.4 --num-executors 3 --driver-memory 4g --executor-memory 4g
@mcmoe
Copy link
Author

mcmoe commented Aug 26, 2019

To open locally ssh tunnel to the instance

ssh -i /path/to/key.pem -4 -fN -L 9999:localhost:8888 ec2-user@ec2-xx-xx-xx-xx.region.compute.amazonaws.com

Note: If you're on flintrock, you can easily ask it to describe the cluster to get its domain name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment