Skip to content

Instantly share code, notes, and snippets.

@slopp
Last active July 3, 2023 21:22
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save slopp/98753ab6efcd654da1b7a49c5639aabf to your computer and use it in GitHub Desktop.
Save slopp/98753ab6efcd654da1b7a49c5639aabf to your computer and use it in GitHub Desktop.
Alternative to Multi-multi Partitions
from dagster import DailyPartitionsDefinition
from dagster import asset, OpExecutionContext, Definitions, AssetKey
from itertools import permutations
# Root Nodes
date = DailyPartitionsDefinition(start_date="2023-06-01")
colors = ["blue", "red"]
shapes = ["circle", "square"]
def root_asset_factory(node, partition_def, group):
@asset(
name=node,
key_prefix=group,
partitions_def=partition_def,
group_name=group,
)
def the_asset(context: OpExecutionContext):
# example of how to construct an appropriate query in the resulting asset
partition = context.partition_key
query = f"""
SELECT *
FROM table
WHERE
{group} = {node}
AND date = {partition}
"""
context.log.info(query)
return
return the_asset
color_assets = [root_asset_factory(color, date, "color") for color in colors]
shape_assets = [root_asset_factory(shape, date, "shape") for shape in shapes]
# Mixed Nodes
color_shapes = []
colors_with_prefix = [f"color/{c}" for c in colors]
shapes_with_prefix = [f"shape/{s}" for s in shapes]
for color in colors_with_prefix:
for shape in shapes_with_prefix:
color_shapes.append((color, shape))
def mixed_asset_factory(ins, partition_def):
asset_key_ins = set([AssetKey.from_user_string(i) for i in ins])
# ins of ("color/blue", "shape/circle") becomes name blue_circle, group color_shape
name = "_".join([key_value[1] for key_value in [i.split("/") for i in ins]])
group = "_".join([key_value[0] for key_value in [i.split("/") for i in ins]])
@asset(
name=name,
group_name=group,
partitions_def=partition_def,
non_argument_deps=asset_key_ins,
)
def the_asset(context: OpExecutionContext):
# example of how to construct an appropriate query in the resulting asset
partition = context.partition_key
node_where_clause = ""
for i in ins:
kv = i.split("/")
node_where_clause += f"""
AND {kv[0]} = \"{kv[1]}\"
"""
query = f"""
SELECT *
FROM table
WHERE
date = {partition}
{node_where_clause}
"""
context.log.info(query)
return
return the_asset
color_shape_assets = [
mixed_asset_factory(upstreams, date) for upstreams in color_shapes
]
defs = Definitions(assets=[*color_assets, *shape_assets, *color_shape_assets])

To run:

pip install dagster, dagit 
dagster dev -f 3d_multi_partitions.py

This example shows how to use asset factories to construct an asset graph that contains 3+ dimensional "partioned" assets. In this approach, eash asset group represents a dimension, with each asset inside the group representing a "partition". In this example, "shape" and "color" are both asset groups. In the "shape" group there are assets for each value of interest, "circle" and "square". We continue to use dagster's built-in partitions for dates. Downstream combinations of assets can be created with the appropriate upstream dependencies. In this example a downstream asset is created for each permutation of color and shape, eg, red_circle. Like regular assets, this dependency graph could be used to fuel auto-materialization policies. This result is comparable to PartitionMapping in the 2D case, but extensible to the 3+ dimensions.

Screen Shot 2023-07-03 at 1 15 07 PM

In this approach the Dagster UI remains navigable:

  • The asset group can be used for UI sub-selection:

Screen Shot 2023-07-03 at 1 15 15 PM

  • Asset search also works for UI sub-selection:

Screen Shot 2023-07-03 at 1 15 27 PM

Screen Shot 2023-07-03 at 1 15 44 PM

  • Backfills can be launched across the date dimension as normal. To launch a backfill of multiple dimensions, eg date and shape, simply select the relevant assets, eg "all circle assets" and then materialize all.

  • The code shows how to utilize the dimensions within the asset for appropriate computation. In this case a sample SQL query is constructed. This example uses non_argument_deps instead of any IO manager, so it is up to the asset code to either load and store the data or use the parameters to execute an appropriate client command:

Screen Shot 2023-07-03 at 1 17 09 PM

The example could be extended to additional dimensions by further iterating on the "permutation loop", eg crossing the list of color_shape tuples with a list of sizes to create a nested tuple: ((color, shape), size).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment