Skip to content

Instantly share code, notes, and snippets.

@shamb0
Created August 24, 2024 03:03
Show Gist options
  • Save shamb0/ff50fba34822781464d8fdc15ce2f797 to your computer and use it in GitHub Desktop.
Save shamb0/ff50fba34822781464d8fdc15ce2f797 to your computer and use it in GitHub Desktop.

Multi-level Partitioned Tables: TL;DR

Concept

  • Divides large tables into smaller, hierarchical partitions
  • Example: Two-level partitioning by year and manufacturer for auto sales data
  • Benefits: Improved query performance, efficient data management, parallel query execution, simplified maintenance, and better data organization

Demo Setup and Implementation Details

  • Components: PostgreSQL, S3 Storage, pg_analytics Foreign Data Wrapper (FDW), DuckDB
  • Key steps:
    1. Data generation and organization (local and S3)
    2. Database setup (partitioned table structure and FDW connections)
    3. Test cases (total sales, average price, monthly sales assertions)

Partitioned Table Structure and S3 Integration

  • Root partitioned table in PostgreSQL
  • Year-level partitions
  • Manufacturer-level foreign tables linked to S3 Parquet files
  • Leverages PostgreSQL's partition pruning and pg_analytics for efficient S3 data querying

Key Features

  • Transparent integration with PostgreSQL
  • Efficient query optimization for S3-stored data
  • Scalable and cost-effective storage solution
  • Improved performance for large, partitioned datasets

Demo Launch

RUST_LOG=info cargo test --test test_mlp_auto_sales -- --nocapture

For more information, contact the ParadeDB team team or join ParadeDB Slack Community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment