Skip to content

Instantly share code, notes, and snippets.

@etheleon
Last active February 25, 2022 07:46
Show Gist options
  • Save etheleon/68734f60683fb0b1f59e72fac4b1d72f to your computer and use it in GitHub Desktop.
Save etheleon/68734f60683fb0b1f59e72fac4b1d72f to your computer and use it in GitHub Desktop.
Kubeflow Pipeline components

There's multiple ways to generate a kfp component:

  1. from python function
  2. from file/text

From file/text

You'll use component.load_component_from_<file|text> when you need to interface with a command line tool.

Example of this would include a spark job submission tool such as dataproc from GCP

gcloud dataproc jobs submit spark --cluster example-cluster \
    --region=region \
    --class org.apache.spark.examples.SparkPi \
    --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- 1000

The easiest way to code this up

name: Generic dataproc submission component
description: Submits spark job for generating train and target data using dataproc

inputs:
- { name: region, type: String, description: "region"}
- { name: class, type: String,  default: "the java class"}
- { name: jars, type: String,  default: "the jar files"}
- { name: args, type: String, default: "the arguments"}


outputs:
  - name: logs
    type: String

metadata:
  annotations:
    "iam.amazonaws.com/role": "<your iam role>"
implementation:
  container:
    image: <image>
    command:
      - sh
      - -exc
      - |
        gcloud dataproc jobs submit spark \
            --region=$0 \
            --class $1 \
            --jars $2 -- $3
      - inputValue: region
      - inputValue: class
      - inputValue: jars
      - inputValue: args

If you have a scenerio where you need to submit your job using another tool which requires the pyspark application's job args to be passed as an argument ie.

cli-tool -executors 30 -args '--app-config s3://wow --version 0.01'

You can follow this example. using double quotes " is allowed in the multiline string

name: Generic chimera submission component
description: Submits spark job for generating train and target data using chimeracli

inputs:
- { name: region, type: String, description: "region"}
- { name: class, type: String,  default: "the java class"}
- { name: jars, type: String,  default: "the jar files"}
- { name: args, type: String, default: "the arguments"}


outputs:
  - name: logs
    type: String

metadata:
  annotations:
    "iam.amazonaws.com/role": "<your iam role>"
implementation:
  container:
    image: <image>
    command:
      - sh
      - -exc
      - |
        cli-tool jobs submit spark \
            --region=$0 \
            --class $1 \
            --jars $2
            --args "--app-config $3 --version $4"
      - inputValue: region
      - inputValue: class
      - inputValue: jars
      - inputValue: config
      - inputValue: version

You can test the command locally first by running:

sh -exc 'cli-tool jobs submit spark --region $0 --class $1 --jars $2 --args "--app-config $3 --version $4"' 'us' 'MyClass' 'some jars' 'some config' '0.12.3'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment