Skip to content

Instantly share code, notes, and snippets.

Avatar

Clemens Winter cswinter

View GitHub Profile
View clickhouse-log-import
time (for filename in /home/clemens/Dropbox/nyc-taxi-data-denormalized/trips_x*.csv.gz; do
gunzip -c $filename | \
python trans.py | \
clickhouse-client \
--query="INSERT INTO trips FORMAT CSV"
done)
View clickhouse-cold-query-hdd-1-thread
This file has been truncated, but you can view the full file.
➜ ~ uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ ~ cd nyc-benchmarks/nyc-clickhouse
➜ nyc-clickhouse sudo lshw -class disk -class storage
[sudo] password for clemens:
*-storage
description: SATA controller
product: ASM1062 Serial ATA Controller
vendor: ASMedia Technology Inc.
View clickhouse-cold-query-ssd
This file has been truncated, but you can view the full file.
➜ ~ uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ ~ sudo lshw -class disk -class storage
[sudo] password for clemens:
*-storage
description: SATA controller
product: ASM1062 Serial ATA Controller
vendor: ASMedia Technology Inc.
physical id: 0
View clickhouse-cold-query-hdd-12-threads
This file has been truncated, but you can view the full file.
➜ nyc-clickhouse uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ nyc-clickhouse sudo lshw -class disk -class storage
*-storage
description: SATA controller
product: ASM1062 Serial ATA Controller
vendor: ASMedia Technology Inc.
physical id: 0
View locustdb-bulk-load-ssd-hdd
➜ ~ uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ ~ sudo lshw -class disk -class storage
[sudo] password for clemens:
*-storage
description: SATA controller
product: ASM1062 Serial ATA Controller
vendor: ASMedia Technology Inc.
physical id: 0
bus info: pci@0000:06:00.0
View locustdb-ingestion-ssd
➜ ~ uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ ~ sudo lshw -class disk -class storage
[sudo] password for clemens:
*-storage
description: SATA controller
product: ASM1062 Serial ATA Controller
vendor: ASMedia Technology Inc.
physical id: 0
bus info: pci@0000:06:00.0
View locustdb-ingestion-hdd
➜ ~ uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ ~ sudo lshw -class disk -class storage
[sudo] password for clemens:
*-storage
description: SATA controller
product: ASM1062 Serial ATA Controller
vendor: ASMedia Technology Inc.
physical id: 0
bus info: pci@0000:06:00.0
View locustdb-lz4-vs-nolz4
➜ ~ uname -a
Linux Archon 4.4.0-66-generic #87-Ubuntu SMP Fri Mar 3 15:29:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
➜ ~ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
View import_mergetree
CREATE TABLE trips_mergetree
ENGINE = MergeTree(pickup_date, pickup_datetime, 8192)
AS SELECT
trip_id,
CAST(vendor_id AS Enum8('1' = 1, '2' = 2, '3' = 3, 'CMT' = 4, 'VTS' = 5, 'DDS' = 6, 'B02512' = 10, 'B02598' = 11, 'B02617' = 12, 'B02682' = 13, 'B02764' = 14)) AS vendor_id,
toDate(pickup_datetime) AS pickup_date,
ifNull(pickup_datetime, toDateTime(0)) AS pickup_datetime,
toDate(dropoff_datetime) AS dropoff_date,
ifNull(dropoff_datetime, toDateTime(0)) AS dropoff_datetime,
assumeNotNull(store_and_fwd_flag) IN ('Y', '1', '2') AS store_and_fwd_flag,
View import.sh
time (for filename in /home/clemens/Dropbox/nyc-taxi-data-denormalized/trips_x*.csv.gz; do
gunzip -c $filename | \
python trans.py | \
clickhouse-client \
--query="INSERT INTO trips FORMAT CSV"
done)