Skip to content

Instantly share code, notes, and snippets.

View rory-data's full-sized avatar

rorydph rory-data

View GitHub Profile
@rory-data
rory-data / dlt_advanced_lesson_9.py
Created December 31, 2025 02:23
dlt pipeline optimisation exercise for dlt Advanced course from dltHub Education.
"""
dlt pipeline optimisation exercise for dlt Advanced course from dltHub Education.
This pipeline fetches data from the Jaffle Shop API using RESTClient with page-number pagination
and loads it into a DuckDB destination.
The goal is to make the pipeline as fast as possible, using the below techniques, while keeping
the results correct.
- Chunking
- Parallelism
@rory-data
rory-data / arrow_validation_rules.md
Created December 30, 2025 04:21
Arrow validation rules for DQ

Rules definitions at a CPP-level as of 2025-12-30.

utf8_is_printable

The below unicode categories are being excluded via this check:

Constant Category Name **Description **
CN Unassigned Reserved or non-existent codepoints.
CC Control Legacy C0/C1 control codes (like \n or \r).