This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Parquet Data Randomization Utility. | |
Pre-req: `pip install duckdb pyarrow==10.0.1` | |
Usage: `python parquet_scramble.py input_file [output_file]` | |
This will keep the existing parquet schema, but all top-level and nested scalar values will be randomized. | |
Because of the ubiquity of string columns that are low-cardinality and/or widely varying in size, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import Callable, Optional | |
from dagster import ( | |
AssetIn, | |
AssetSelection, | |
Definitions, | |
PartitionKeyRange, | |
PartitionMapping, | |
PartitionsDefinition, | |
SourceAsset, |