Skip to content

Instantly share code, notes, and snippets.

@ruslandoga
Last active September 24, 2022 15:02
Show Gist options
  • Save ruslandoga/d42e61d6a639bdaebae04812d535fce7 to your computer and use it in GitHub Desktop.
Save ruslandoga/d42e61d6a639bdaebae04812d535fce7 to your computer and use it in GitHub Desktop.

This gist uses UK Property Price Paid example dataset.

mac3.local :) select count(*) from uk_price_paid

┌──count()─┐
│ 27359802 │
└──────────┘

mac3.local :) select formatReadableSize(total_bytes) from system.tables where name = 'uk_price_paid'

┌─formatReadableSize(total_bytes)─┐
│ 291.09 MiB                      │
└─────────────────────────────────┘

Inserting data from local clickhouse instance to s3:

INSERT INTO FUNCTION s3(
  'https://ruslan-clickhouse-backup-experiment.s3.eu-north-1.amazonaws.com/csv/uk_price_paid_{_partition_id}.csv.lz4',
  -- see https://clickhouse.com/docs/en/integrations/s3/s3-table-engine#managing-credentials
  'access_key_id',
  'secret_access_key',
  'CSV' -- or https://clickhouse.com/docs/en/interfaces/formats/#csvwithnames
) PARTITION BY rand() % 10
-- a where clause can be added here to only upload prev period's (day, week, month) data
SELECT * FROM uk_price_paid;

Ok.

0 rows in set. Elapsed: 25.004 sec. Processed 27.36 million rows, 1.31 GB (1.09 million rows/s., 52.37 MB/s.)

Checking s3 bucket contents:

> aws s3 ls s3://ruslan-clickhouse-backup-experiment/csv/
2022-08-12 17:05:41   62139785 uk_price_paid_0.csv.lz4
2022-08-12 17:05:41   62145476 uk_price_paid_1.csv.lz4
2022-08-12 17:05:41   62136800 uk_price_paid_2.csv.lz4
2022-08-12 17:05:41   62183195 uk_price_paid_3.csv.lz4
2022-08-12 17:05:41   62148444 uk_price_paid_4.csv.lz4
2022-08-12 17:05:41   62157473 uk_price_paid_5.csv.lz4
2022-08-12 17:05:41   62177198 uk_price_paid_6.csv.lz4
2022-08-12 17:05:41   62160887 uk_price_paid_7.csv.lz4
2022-08-12 17:05:41   62160395 uk_price_paid_8.csv.lz4
2022-08-12 17:05:41   62143395 uk_price_paid_9.csv.lz4

Trying to read a single column from s3 bucket:

mac3.local :) select c2 as date from s3('https://ruslan-clickhouse-backup-experiment.s3.eu-north-1.amazonaws.com/csv/uk_price_paid_*.csv.lz4', 'access_key_id', 'secret_access_key', 'CSV') limit 10;

┌─date───────┐
│ 1997-10-31 │
│ 1997-06-27 │
│ 1997-02-10 │
│ 1999-03-09 │
│ 2000-12-22 │
│ 2000-10-25 │
│ 2000-02-21 │
│ 2001-08-10 │
│ 2000-08-21 │
│ 1996-09-06 │
└────────────┘

10 rows in set. Elapsed: 1.834 sec.

Creating a table to restore data into:

mac3.local :) CREATE TABLE uk_price_paid_2
              (
                  price UInt32,
                  date Date,
                  postcode1 LowCardinality(String),
                  postcode2 LowCardinality(String),
                  type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
                  is_new UInt8,
                  duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
                  addr1 String,
                  addr2 String,
                  street LowCardinality(String),
                  locality LowCardinality(String),
                  town LowCardinality(String),
                  district LowCardinality(String),
                  county LowCardinality(String),
                  category UInt8
              ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);

Restoring the data from s3:

mac3.local :) insert into uk_price_paid_2 select * from s3('https://ruslan-clickhouse-backup-experiment.s3.eu-north-1.amazonaws.com/csv/uk_price_paid_*.csv.lz4', 'access_key_id', 'secret_access_key', 'CSV');

Ok.

0 rows in set. Elapsed: 57.638 sec. Processed 27.36 million rows, 6.43 GB (474.68 thousand rows/s., 111.55 MB/s.)
mac3.local :) select count() from uk_price_paid_2

┌──count()─┐
│ 27359802 │
└──────────┘
@ukutaht
Copy link

ukutaht commented Aug 15, 2022

mac3.local :) select c2 as date from s3('https://ruslan-clickhouse-backup-experiment.s3.eu-north-1.amazonaws.com/csv/uk_price_paid_*.csv.lz4', 'access_key_id', 'secret_access_key', 'CSV') limit 10;

This is so cool :)

@ruslandoga
Copy link
Author

ruslandoga commented Sep 24, 2022

All this works in Clickhousex too https://github.com/ruslandoga/ch-import-export

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment