Skip to content

Instantly share code, notes, and snippets.

Avatar

Clemens Winter cswinter

View GitHub Profile
View create_table
CREATE TABLE trips (
trip_id UInt32,
vendor_id String,
pickup_datetime DateTime,
dropoff_datetime Nullable(DateTime),
store_and_fwd_flag Nullable(FixedString(1)),
rate_code_id Nullable(UInt8),
pickup_longitude Nullable(Float64),
View trans.py
import sys
for line in sys.stdin:
print ','.join([item if len(item.strip()) else '\N'
for item in line.strip().split(',')])
View load.sh
#!/bin/sh
ls /home/clemens/Dropbox/nyc-taxi-data-denormalized/* | \
xargs \
-P 6 `# running with -P 12 angers the oom killer` \
-L 1 \
~/q/l64/q load.q \
-q \
-s 2
View load.q
m:(`trip_id ;"I";
`vendor_id ;"*";
`pickup_datetime ;"P";
`dropoff_datetime ;"P";
`store_and_fwd_flag ;"B";
`rate_code_id ;"X";
`pickup_longitude ;"E";
`pickup_latitude ;"E";
`dropoff_longitude ;"E";
`dropoff_latitude ;"E";
View locustdb-nerfed-results.txt
➜ LocustDB git:(master) RUSTFLAGS="-Ccodegen-units=1" CARGO_INCREMENTAL=0 cargo run --features nerf --release --bin repl -- nyc
Finished release [optimized] target(s) in 0.06s
Running `target/release/repl nyc`
NERFED!
Loading nyc into table trips.
Loaded data in 1121 seconds.
# Table `trips` (1464785771 rows, 42794.30 MiB) #
payment_type: 1397.69MiB
pickup_puma: 2611.38MiB
View locustdb-results.txt
➜ LocustDB git:(master) git pull
Enter passphrase for key '/home/clemens/.ssh/id_rsa':
remote: Counting objects: 17, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 17 (delta 11), reused 17 (delta 11), pack-reused 0
Unpacking objects: 100% (17/17), done.
From github.com:cswinter/LocustDB
106acb1..a10d938 master -> origin/master
Updating 106acb1..a10d938
Fast-forward
View clickhouse-results.txt
This file has been truncated, but you can view the full file.
➜ ~ sudo service clickhouse-server status
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
Loaded: loaded (/etc/systemd/system/clickhouse-server.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-07-04 15:28:31 PDT; 4h 35min ago
Main PID: 1108 (clickhouse-serv)
CGroup: /system.slice/clickhouse-server.service
└─1108 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml
Jul 04 15:28:31 Archon systemd[1]: Starting ClickHouse Server (analytic DBMS for big data)...
View taxi_data_export.sql
COPY (
SELECT trips.id,
trips.vendor_id,
trips.pickup_datetime,
trips.dropoff_datetime,
trips.store_and_fwd_flag,
trips.rate_code_id,
trips.pickup_longitude,
trips.pickup_latitude,
trips.dropoff_longitude,