Skip to content

Instantly share code, notes, and snippets.

@gingerwizard
Last active April 26, 2023 13:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gingerwizard/80e60c61ee7d0623003a688f21e17a14 to your computer and use it in GitHub Desktop.
Save gingerwizard/80e60c61ee7d0623003a688f21e17a14 to your computer and use it in GitHub Desktop.

Below creates a Parquet file with 1 row group for UK house price dataset (28m rows, approx. 200mb Parquet with LZ4):

INSERT INTO FUNCTION file('house_prices-1-row-group.parquet') SELECT *
FROM uk_price_paid
SETTINGS min_insert_block_size_bytes = 10000000000, 
min_insert_block_size_rows=1000000000, 
output_format_parquet_row_group_size=1000000000               
./clickhouse local --query "SELECT * FROM file('house_prices-1-row.parquet', ParquetMetadata) 
FORMAT PrettyJSONEachRow" | jq -r '.num_row_groups'

1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment