Created
July 20, 2021 21:50
-
-
Save gvyshnya/521733743e2895018f333389877597d4 to your computer and use it in GitHub Desktop.
This script automates creating a permanent Dataproc cluster with Jupyter notebook/Jupyter Lab/PySpark notebook components enabled
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
REGION=europe-west1 | |
ZONE=europe-west1-b | |
CLUSTER_NAME=dev-cluster | |
SERVICE_ACCOUNT=your_service_account_name@your-gcp-project.iam.gserviceaccount.com | |
BUCKET_NAME=your-dataproc-staging-bucket | |
gcloud dataproc clusters create ${CLUSTER_NAME} \ | |
--region ${REGION} \ | |
--zone ${ZONE} \ | |
--bucket $BUCKET_NAME \ | |
--service-account $SERVICE_ACCOUNT \ | |
--optional-components=ANACONDA,JUPYTER \ | |
--enable-component-gateway \ | |
--initialization-actions gs://goog-dataproc-initialization-actions-${REGION}/connectors/connectors.sh \ | |
--metadata gcs-connector-version=2.2.0 \ | |
--metadata bigquery-connector-version=1.2.0 \ | |
--metadata spark-bigquery-connector-version=0.19.1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment