Skip to content

Instantly share code, notes, and snippets.

@garystafford
Last active December 15, 2018 05:31
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Save garystafford/f78d35ae1d59e965d9abd80a6ba890c4 to your computer and use it in GitHub Desktop.
jobs:
- pysparkJob:
args:
- storage_bucket_parameter
- data_file_parameter
- results_directory_parameter
mainPythonFileUri: main_python_file_parameter
stepId: ibrd-pyspark
placement:
managedCluster:
clusterName: three-node-cluster
config:
gceClusterConfig:
zoneUri: us-east1-b
masterConfig:
diskConfig:
bootDiskSizeGb: 500
machineTypeUri: n1-standard-4
softwareConfig:
imageVersion: 1.3-deb9
workerConfig:
diskConfig:
bootDiskSizeGb: 500
machineTypeUri: n1-standard-4
numInstances: 2
parameters:
- description: Python script to run
fields:
- jobs['ibrd-pyspark'].pysparkJob.mainPythonFileUri
name: MAIN_PYTHON_FILE
- description: Storage bucket location of data file and results
fields:
- jobs['ibrd-pyspark'].pysparkJob.args[0]
name: STORAGE_BUCKET
validation:
regex:
regexes:
- gs://.*
- description: IBRD data file
fields:
- jobs['ibrd-pyspark'].pysparkJob.args[1]
name: IBRD_DATA_FILE
- description: Result directory
fields:
- jobs['ibrd-pyspark'].pysparkJob.args[2]
name: RESULTS_DIRECTORY
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment