Skip to content

Instantly share code, notes, and snippets.

@nezed
Created December 25, 2018 20:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nezed/921eca7b10b88f7cf9edea009a5149b1 to your computer and use it in GitHub Desktop.
Save nezed/921eca7b10b88f7cf9edea009a5149b1 to your computer and use it in GitHub Desktop.
Runs clickhouse-obfuscator from docker image, which randomizes your dataset to make its data not sensetive
docker run --rm \
--volume "$(pwd)"/seeds:/tmp/seeds \
--entrypoint bash \
yandex/clickhouse-server:latest \
-c '/usr/bin/clickhouse obfuscator --seed "$(head -c16 /dev/urandom | base64)" --input-format TSV --output-format TSV --structure "date Date, datetime DateTime, ab_tag String, client_name String" < /tmp/seeds/rows.tsv' \
> obfuscated_rows.tsv
# 1. stdin pipe was created inside container to prevent "Input must be seekable file (it will be read twice)." error
# 2. UUID, LowCardinality(…) and Enum8/16 data-types are not supported in `--structure`.
# Use FixedString(36) or String instead (See https://github.com/yandex/ClickHouse/blob/881893d/dbms/programs/obfuscator/Obfuscator.cpp#L873-L895)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment