Skip to content

Instantly share code, notes, and snippets.

@cerisier
Last active July 2, 2021 21:56
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save cerisier/118c06d1a0147d1fb898218b57ba82a3 to your computer and use it in GitHub Desktop.
Save cerisier/118c06d1a0147d1fb898218b57ba82a3 to your computer and use it in GitHub Desktop.
Dataproc initialization action script for installing python3
#!/bin/bash
# from https://gist.githubusercontent.com/nehalecky/9258c01fb2077f51545a/raw/789f08141dc681cf1ad5da05455c2cd01d1649e8/install-py3-dataproc.sh
apt-get -y install python3
echo "export PYSPARK_PYTHON=python3" | tee -a /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh
echo "Adding PYTHONHASHSEED=0 to profiles and spark-defaults.conf..."
echo "export PYTHONHASHSEED=0" | tee -a /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh
echo "spark.executorEnv.PYTHONHASHSEED=0" >> /etc/spark/conf/spark-defaults.conf
@cerisier
Copy link
Author

cerisier commented Jul 6, 2016

Use this to create the dataproc cluster

gcloud dataproc clusters create zenly-data --master-boot-disk-size-gb 100 --image-version 1.0 --bucket BUCKET --master-machine-type n1-standard-4  --num-workers 2 --worker-boot-disk-size-gb 100 --worker-machine-type n1-standard-4 --zone us-central1-f --subnet us-central --properties spark:spark.executorEnv.PYTHONHASHSEED=0 --initialization-actions gs://BUCKET/install-py3-dataproc.sh

@avloss
Copy link

avloss commented Apr 2, 2020

if someone stumbles upon this in 2020 -- you need to specify latest image version --image-version 1.5

@CalenDario13
Copy link

if someone stumbles upon this in 2020 -- you need to specify latest image version --image-version 1.5

So If I am using image-version 1.3 and trying to use thi script, it won't work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment