Skip to content

Instantly share code, notes, and snippets.

@lewisrodgers
Last active April 4, 2018 17:00
Show Gist options
  • Save lewisrodgers/5520f7a5a2d2604b5a64509b6fc25e5c to your computer and use it in GitHub Desktop.
Save lewisrodgers/5520f7a5a2d2604b5a64509b6fc25e5c to your computer and use it in GitHub Desktop.
Deploying Cloud Functions to schedule a Python Dataflow pipeline.

Deploying Cloud Functions to schedule a Python Dataflow pipeline.

Use this as a guide. But instead of using the java runtime, we'll use python.

The setup can be accomplish with virtualenv to create an isolated environment.

The project folder structure will look like this:

cloudfunction/
- index.js
- pipeline.py
- ENV/

So...

mkdir cloudfunction
cd cloudfunction
pip install virtualenv
virtualenv ENV

Install the Dataflow SDK (2.4.0) inside the virtual environment.

source ENV/bin/activate
pip install google-cloud-dataflow

Dump your Cloud Function index.js script and Dataflow pipline.py script into the project folder.

Now bundle all the files by compressing the contents of the cloudfunction/ folder into a single zip file.

It should go from this:

cloudfunction/
- index.js
- pipeline.py
- ENV/

To this:

cloudfunctions/
- bundled.zip

Deploy

Use your trigger type of choice when deploying.

gcloud alpha functions deploy FUNCTION --stage-bucket BUCKET --trigger-bucket BUCKET

Or, deploy from the Cloud Console.

// index.js
module.exports = {
myCloudFunctionProcess: function myCloudFunctionProcess () {
const spawn = require('child_process').spawn;
const child = spawn(
'ENV/bin/python', // If you read the guide you'll notice that this is the key difference.
['pipline.py',
'--jobName=FromCloudFunction',
'--project=PROJECT_ID',
'--staging_location=gs://BUCKET/staging',
'--temp_location=gs://BUCKET/temp',
'--runner=DataflowRunner']
);
}
};
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment