You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding an extra package to a Python Dataflow project to run on GCP
The Problem
The documentation for how to deploy a pipeline with extra, non-PyPi, pure Python packages on GCP is missing some detail. This gist shows how to package and deploy an external pure-Python, non-PyPi dependency to a managed dataflow pipeline on GCP.
TL;DR: You external package needs to be a python (source/binary) distro properly packaged and shipped alongside your pipeline. It is not enough to only specify a tar file with a setup.py.
Preparing the External Package
Your external package must have a proper setup.py. What follow is an example setup.py for our ETL package. This is used to package version 1.1.1 of the etl library. The library requires 3 native PyPi packages to run. These are specified in the install_requires field. This package also ships with custom external JSON data, declared in the package_data section. Last, the setuptools.find_packages function searches for all available packages and returns that
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
How to set up TravisCI for projects that push back to github
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Turn a BeautifulSoup form in to a dict of fields and default values - useful for screen scraping forms and then resubmitting them
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A small Symfony 2 class for returning a response as a CSV file. Based on the Symfony JsonResponse class.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Copy MySQL table to big query. If you need to copy all tables, use the loop given at the end.
Exit with error code 3 if blob or text columns are found. The csv files are first copied to google cloud before being imported to big query.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters