Skip to content

Instantly share code, notes, and snippets.

View AlexanderVR's full-sized avatar

Alexander VR AlexanderVR

View GitHub Profile
@AlexanderVR
AlexanderVR / parquet_scramble.py
Last active January 2, 2023 20:46
Parquet Data Randomization Utility
"""
Parquet Data Randomization Utility.
Pre-req: `pip install duckdb pyarrow==10.0.1`
Usage: `python parquet_scramble.py input_file [output_file]`
This will keep the existing parquet schema, but all top-level and nested scalar values will be randomized.
Because of the ubiquity of string columns that are low-cardinality and/or widely varying in size,
@AlexanderVR
AlexanderVR / mapped_assets.py
Created December 30, 2022 21:49
custom PartitionMapping for assets
from typing import Callable, Optional
from dagster import (
AssetIn,
AssetSelection,
Definitions,
PartitionKeyRange,
PartitionMapping,
PartitionsDefinition,
SourceAsset,