Skip to content

Instantly share code, notes, and snippets.

@pandemicsyn
Created February 24, 2022 17:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pandemicsyn/ee73aa803219dc4394b628b8cdae64e0 to your computer and use it in GitHub Desktop.
Save pandemicsyn/ee73aa803219dc4394b628b8cdae64e0 to your computer and use it in GitHub Desktop.
stream maps demo
theme
/home/syn/.glamour/dracula.json

meltano run stream map transforms demo

Stream map support for meltano run landed via a new Singer compatible Mapper plugin type! Like meltano run itself it needs testing/feedback.


What can you use Mapper plugins for ?

Mappers allow you to transform or manipulate data after extraction and before loading:

  • Streams/properties can be aliased to provide custom naming downstream.
  • Stream records can be filtered based on any user-defined logic.
  • Properties can be transformed inline (i.e. converting types, sanitizing PII data).
  • Properties can be removed from the stream.
  • New properties can be added to the stream.

Note that mappers are currently only available when using meltano run.


Mapper plugin example

  mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:
    - name: hide-gitlab-secrets
      config:
        transformations:
        - field_id: author_email
          tap_stream_name: commits
          type: MASK-HIDDEN
        - field_id: committer_email
          tap_stream_name: commits
          type: MASK-HIDDEN
    - name: who-needs-ids
      config:
        transformations:
        - field_id: id
          tap_stream_name: commits
          type: SET-NULL

We've gained a new mapper plugin type and associated config.


Mapper plugin example

  mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:    <--------------------------- top level mappings key
    - name: hide-gitlab-secrets
      config:
        transformations:
        - field_id: author_email
          tap_stream_name: commits
          type: MASK-HIDDEN
        - field_id: committer_email
          tap_stream_name: commits
          type: MASK-HIDDEN
    - name: who-needs-ids
      config:
        transformations:
        - field_id: id
          tap_stream_name: commits
          type: SET-NULL

A bit different than usual. You don't define a single top level config for the mapper. You instead define mappings!


Mapper plugin example

  mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:
    - name: hide-gitlab-secrets  <--------- mapping config with two actions
      config:
        transformations:
        - field_id: author_email
          tap_stream_name: commits
          type: MASK-HIDDEN
        - field_id: committer_email
          tap_stream_name: commits
          type: MASK-HIDDEN
    - name: who-needs-ids  <--------------- mapping config with another
      config:
        transformations:
        - field_id: id
          tap_stream_name: commits
          type: SET-NULL

You can define multiple mappings per mapper that you can then invoke by name.


Mapper plugin example

  mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:
    - name: hide-gitlab-secrets
      config: <----------------- config gets passed to the plugin at invocation
        transformations:
        - field_id: author_email
          tap_stream_name: commits
          type: MASK-HIDDEN
        - field_id: committer_email
          tap_stream_name: commits
          type: MASK-HIDDEN
    - name: who-needs-ids
      config: <---------------- config gets passed to the plugin at invocation
        transformations:
        - field_id: id
          tap_stream_name: commits
          type: SET-NULL

The config defined for each mapping is what actually gets passed to the mapper plugin at invocation time. What the config holds will differ between plugins...


Mapper plugin example

  mappers:
  - name: transform-field
    variant: transferwise
    pip_url: pipelinewise-transform-field
    executable: transform-field
    mappings:
    - name: who-needs-ids
      config: <---------------- config will vary plugin
        transformations:
        - field_id: id
          tap_stream_name: commits
          type: SET-NULL
  - name: awesome-custom-transform
    pip_url: very-awesome-dataco-transforms
    mappings:
    - name: fix-ids-in-commits
      config: <---------------- config will vary plugin
        transformations:
        - key: id
          set: 42
  - name: meltano-map-transformer
    variant: meltano
    pip_url: git+https://github.com/MeltanoLabs/meltano-map-transform.git
    executable: meltano-map-transform
    mappings:
    - name: backup-commits
      config: <---------------- config will vary plugin
        stream_maps:
          commits:
            __alias__: "commits_orig"

Same...but....different.


How to use actually use these

Invoke one:

$ meltano run tap-gitlab who-needs-ids target-jsonl

And the mapping name will get resolved to the plugin:

~~~graph-easy --as=boxart
[ who-needs-ids ] - to -> [ transform-field ]
~~~

How to use actually use these

Invoke one:

$ meltano run tap-gitlab who-needs-ids target-jsonl

And the mapping name will get resolved to the plugin:

~~~graph-easy --as=boxart
[ who-needs-ids ] - to -> [ transform-field ]
~~~

Under the hood

~~~graph-easy --as=boxart
[ tap ] - to -> [ mask author_email \nmask committer_email ] - to -> [ target ]
~~~

How to use actually use these

Invoke n+1:

$ meltano run tap-gitlab hide-secrets custom-thing fix-id target-jsonl

Under the hood

~~~graph-easy --as=boxart
[ tap ] - to -> [ tansform-field ] - to -> [ custom ] - to -> [transform-field] - to -> [ target ]
~~~

In action

...Time to flip tabs and see it in action...

 _________________________
< less slides more demos! >
 -------------------------
             O
              O
               o
                \||/
                |  @___oo
      /\  /\   / (__,,,,|
     ) /^\) ^\/ _)
     )   /^\/   _)
     )   _ /  / _)
 /\  )/\/ ||  | )_)
<  >      |(,,) )__)
 ||      /    \)___)\
 | \____(      )___) )___
  \______(_______;;; __;;;


Recap

  • meltano run only
  • invoked by mapping name instead of plugin name
  • arbitrary number of mappers can run between tap/target
  • config is manual atm but theres a issue on the backlog already.
  • failure in a mapper will exit the job

Thank you for coming to my ted talk demo!

ps: We're hiring - come fix my terrible python - https://meltano.com/jobs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment