Skip to content

Instantly share code, notes, and snippets.

@gvyshnya
Created July 20, 2021 21:50
Show Gist options
  • Save gvyshnya/521733743e2895018f333389877597d4 to your computer and use it in GitHub Desktop.
Save gvyshnya/521733743e2895018f333389877597d4 to your computer and use it in GitHub Desktop.
This script automates creating a permanent Dataproc cluster with Jupyter notebook/Jupyter Lab/PySpark notebook components enabled
REGION=europe-west1
ZONE=europe-west1-b
CLUSTER_NAME=dev-cluster
SERVICE_ACCOUNT=your_service_account_name@your-gcp-project.iam.gserviceaccount.com
BUCKET_NAME=your-dataproc-staging-bucket
gcloud dataproc clusters create ${CLUSTER_NAME} \
--region ${REGION} \
--zone ${ZONE} \
--bucket $BUCKET_NAME \
--service-account $SERVICE_ACCOUNT \
--optional-components=ANACONDA,JUPYTER \
--enable-component-gateway \
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/connectors/connectors.sh \
--metadata gcs-connector-version=2.2.0 \
--metadata bigquery-connector-version=1.2.0 \
--metadata spark-bigquery-connector-version=0.19.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment