Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Dataproc initialization action script for installing python3
# from
apt-get -y install python3
echo "export PYSPARK_PYTHON=python3" | tee -a /etc/profile.d/ /etc/*bashrc /usr/lib/spark/conf/
echo "Adding PYTHONHASHSEED=0 to profiles and spark-defaults.conf..."
echo "export PYTHONHASHSEED=0" | tee -a /etc/profile.d/ /etc/*bashrc /usr/lib/spark/conf/
echo "spark.executorEnv.PYTHONHASHSEED=0" >> /etc/spark/conf/spark-defaults.conf

This comment has been minimized.

Copy link
Owner Author

cerisier commented Jul 6, 2016

Use this to create the dataproc cluster

gcloud dataproc clusters create zenly-data --master-boot-disk-size-gb 100 --image-version 1.0 --bucket BUCKET --master-machine-type n1-standard-4  --num-workers 2 --worker-boot-disk-size-gb 100 --worker-machine-type n1-standard-4 --zone us-central1-f --subnet us-central --properties spark:spark.executorEnv.PYTHONHASHSEED=0 --initialization-actions gs://BUCKET/

This comment has been minimized.

Copy link

avloss commented Apr 2, 2020

if someone stumbles upon this in 2020 -- you need to specify latest image version --image-version 1.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.