-
You need to have Java 8 installed, I tried to run Spark using Java 11 and it didn't work, so please make sure that you JAVA_HOME environment variable is pointing to your Java 8 installation directory. In MacOS you can check it like this:
- To know where is you JAVA_HOME pointing to:
echo $JAVA_HOME
- To find out which Java version you have installed ans where are they:
/usr/libexec/java_home -V
- Choose the Java 8 version and set it as you JAVA_HOME:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_152.jdk/Contents/Home
- To know where is you JAVA_HOME pointing to:
-
Install Python 3+, if you don't have it already. Friendly reminder: Python 2.7 will not be maintained past 2020.
-
Create a directory to work in:
mkdir spot-dev-talk-spark && cd spot-dev-talk-spark
- Create a virtual enviroment using you brand new Python3 installation:
python3 -m venv .venv
- Activate your awesome virtual environment:
source .venv/bin/activate
- Now lets get some nice Python packages to help us get going:
pip install jupyter findspark
- First Download Apache Spark
wget https://www-eu.apache.org/dist/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
tar -xzf spark-2.4.3-bin-hadoop2.7.tgz
- Let's set an environment variable for your Spark installation and the Spark local IP address:
export SPARK_HOME=$(pwd)/spark-2.4.3-bin-hadoop2.7/
export SPARK_LOCAL_IP="127.0.0.1"
wget https://wwwdev.ebi.ac.uk/~federico/DR10.0.tar.gz
tar -xzf DR10.0.tar.gz
wget 'https://wwwdev.ebi.ac.uk/~federico/SPOT Dev Talk - Spark 101.ipynb'
jupyter notebook
And open the SPOT Dev Talk - Spark 101.ipynb file. Run the code that is already there so you can be sure your Spark setup is OK.