Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Dataproc initialization action script for installing python3
#!/bin/bash
# from https://gist.githubusercontent.com/nehalecky/9258c01fb2077f51545a/raw/789f08141dc681cf1ad5da05455c2cd01d1649e8/install-py3-dataproc.sh
apt-get -y install python3
echo "export PYSPARK_PYTHON=python3" | tee -a /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh
echo "Adding PYTHONHASHSEED=0 to profiles and spark-defaults.conf..."
echo "export PYTHONHASHSEED=0" | tee -a /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh
echo "spark.executorEnv.PYTHONHASHSEED=0" >> /etc/spark/conf/spark-defaults.conf
@cerisier

This comment has been minimized.

Copy link
Owner Author

cerisier commented Jul 6, 2016

Use this to create the dataproc cluster

gcloud dataproc clusters create zenly-data --master-boot-disk-size-gb 100 --image-version 1.0 --bucket BUCKET --master-machine-type n1-standard-4  --num-workers 2 --worker-boot-disk-size-gb 100 --worker-machine-type n1-standard-4 --zone us-central1-f --subnet us-central --properties spark:spark.executorEnv.PYTHONHASHSEED=0 --initialization-actions gs://BUCKET/install-py3-dataproc.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.