Skip to content

Instantly share code, notes, and snippets.

@robertwb
Last active December 15, 2023 11:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save robertwb/2cb26973f1b1203e8f5f8f88c5764da0 to your computer and use it in GitHub Desktop.
Save robertwb/2cb26973f1b1203e8f5f8f88c5764da0 to your computer and use it in GitHub Desktop.
Yaml bug bash samples
#
# python -m apache_beam.yaml.main --pipeline_spec_file=yaml-pipelines/nasa-dataflow.yaml
#
pipeline:
type: chain
source:
type: ReadFromCsv
config:
path: 'gs://apache-beam-samples/nasa_jpl_asteroid/full.csv'
transforms:
- type: Filter
config:
language: python
keep: 'hazardous_flag == "Y"'
sink:
type: WriteToJson
config:
# Update this to a unique path before running.
path: "gs://apache-beam-yaml-temp/hazardous_astroids.json"
options:
runner: DataflowRunner
project: 'apache-beam-testing'
region: us-central1
temp_location: 'gs://apache-beam-yaml-temp/temp'
# Needed for dev branch.
sdk_location: container
sdk_container_image: 'us-central1-docker.pkg.dev/apache-beam-testing/beam-yaml/beam_python3.9_sdk:2.54.0.dev'
pipeline:
type: chain
# To run locally, run
#
# wget https://storage.googleapis.com/apache-beam-samples/nasa_jpl_asteroid/sample_100000.csv
#
# and change the path to './sample_100000.csv'
source:
type: ReadFromCsv
config:
path: 'gs://apache-beam-samples/nasa_jpl_asteroid/sample_100000.csv'
transforms:
- type: Filter
config:
language: python
keep: 'hazardous_flag == "Y"'
sink:
type: WriteToJson
config:
path: "./hazardous_astroids.json"
pipeline:
type: chain
transforms:
- type: Create
config:
elements: [1, 2, 3]
- type: LogForTesting
pipeline:
type: chain
source:
type: Create
config:
elements:
- a
- b
transforms:
- type: Sql
config:
query: 'SELECT * FROM PCOLLECTION'
sink:
type: LogForTesting
pipeline:
source:
type: ReadFromBigQuery
config:
table: 'apache-beam-testing.beam_bigquery_io_test.taxi_small'
row_restriction: 'meter_reading > 1'
transforms:
- type: MapToFields
input: ReadFromBigQuery
config:
language: python
fields:
passenger_count: passenger_count
distance_to_pier_57: "(69**2 * (40.74347-latitude)**2 + 52**2 * (-74.00935-longitude)**2)**0.5"
- type: LogForTesting
input: MapToFields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment