Instantly share code, notes, and snippets.

Embed
What would you like to do?
This gist explains how to connect jupyterhub with Spark2 on CDH 5.13 Cluster
### This gist explains how to connect jupyterhub with Spark2 on CDH 5.13 Cluster.
Following below instructions, Spark can be configured with Jupyterhub on any cluster, standalone or locally etc
- Install jupyterhub by following instructions on official repo https://github.com/jupyterhub/jupyterhub
- Once installed, before configuring spark2 kernel first locate the jupyter's kernels directory.
On centos7, its under /usr/share/jupyter/kernels/
- Assuming all jupyterhub kernel's are in /usr/share/jupyter/kernels/
mkdir /usr/share/jupyter/kernels/pyspark2
- Create pyspark2 kernel
touch /usr/share/jupyter/kernels/pyspark2/kernel.json
- Add following content
vi /usr/share/jupyter/kernels/pyspark2/kernel.json
{
"argv": [
"python3.6",
"-m",
"ipykernel_launcher",
"-f",
"{connection_file}"
],
"display_name": "Python3.6 + Pyspark(Spark 2.2.0)",
"language": "python",
"env": {
"PYSPARK_PYTHON": "/usr/bin/python3.6",
"SPARK_HOME": "/opt/cloudera/parcels/SPARK2/lib/spark2",
"HADOOP_CONF_DIR": "/etc/spark2/conf/yarn-conf",
"HADOOP_CLIENT_OPTS": "-Xmx2147483648 -XX:MaxPermSize=512M -Djava.net.preferIPv4Stack=true",
"PYTHONPATH": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/lib/py4j-0.10.4-src.zip:/opt/cloudera/parcels/SPARK2/lib/spark2/python/",
"PYTHONSTARTUP": "/opt/cloudera/parcels/SPARK2/lib/spark2/python/pyspark/shell.py",
"PYSPARK_SUBMIT_ARGS": " --master yarn --deploy-mode client pyspark-shell"
}
}
- Update env settings according to your setup if it is different.
- Start jupyterhub & test build spark2 applications
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment