Skip to content

Instantly share code, notes, and snippets.

@WillianFuks
Created November 2, 2017 00:14
Show Gist options
  • Save WillianFuks/c00efa2ca082fd07068f6288f191a2d4 to your computer and use it in GitHub Desktop.
Save WillianFuks/c00efa2ca082fd07068f6288f191a2d4 to your computer and use it in GitHub Desktop.
Creating Clusters in Dataproc
#!/usr/bin/env bash
set -e
function usage {
echo "Creates a Dataproc cluster with a Jupyter interface."
echo "usage $0: [-h] [-n=name] [-b=bucket]"
echo " -h display help"
echo " -n=name name of cluster to create"
echo " -b=bucket name of bucket in GCS for persistence"
exit 1
}
for i in "$@"
do
case $i in
-n=*)
CLUSTER_NAME="${i#*=}"
shift
;;
-b=*)
BUCKET_NAME="${i#*=}"
shift
;;
-h)
usage
;;
*)
;;
esac
done
[[ -z $CLUSTER_NAME ]] && usage
[[ -z $BUCKET_NAME ]] && usage
gcloud dataproc clusters create $CLUSTER_NAME \
--metadata "JUPYTER_PORT=8124,JUPYTER_CONDA_PACKAGES=numpy" \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
--bucket $BUCKET_NAME \
--num-workers 2
#--worker-machine-type=n1-highcpu-8 \
#--master-machine-type=n1-highcpu-8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment