Skip to content

Instantly share code, notes, and snippets.

View kzzzr's full-sized avatar

Artemiy Kzr kzzzr

View GitHub Profile
@kzzzr
kzzzr / docker.md
Last active June 4, 2024 10:01
Docker: move persistent volumes (mounts) into another filesysytem WITHOUT data losses

The problem you face

You have created a VM, deployed your services in Docker on it, services (for example, Kafka + Kafka Connect) started writing data.

Suddenly you find that the disk volume is running out.

df -h
@kzzzr
kzzzr / _file_formats.md
Last active April 9, 2024 11:59
File formats comparison: CSV, JSON, Parquet, ORC

File formats comparison: CSV, JSON, Parquet, ORC

Key results

Whenever you need to store your data on S3 / Data Lake / External table choose file format wisely:

  • Parquet / ORC are the best options due to efficient data layout, compression, indexing capabilities
  • Columnar formats allow for column projection and partition pruning (reading only relevant data!)
  • Binary formats enable schema evolution which is very applicable for constantly changing business environment
@kzzzr
kzzzr / fetch_currencies.sh
Created January 31, 2022 13:36
Fetching exchange rates
## Enable debugging messages and exit on error
# set -ex
## Assign variables directly or use ENV variables
TS=`date +"%Y-%m-%d-%H-%M-%S-%Z"`
# OXR_TOKEN=''
# BASE_CURRENCY='RUB'
# SYMBOLS='AED,EUR,GBP,RUB,USD'
## Chain commands
@kzzzr
kzzzr / int_requests_prebook_to_asap.sql
Created January 27, 2022 20:28
CTE usage showcase
{{
config(
materialized='ephemeral'
)
}}
WITH accepted AS (
SELECT DISTINCT
@kzzzr
kzzzr / CTE_query_plan
Created January 27, 2022 20:00
Excessive CTE usage in Data Mart calculation
XN Merge Left Join DS_DIST_NONE (cost=6768461698952.58..64203387414945.72 rows=11491303 width=4361)
-> XN Hash Left Join DS_BCAST_INNER (cost=5768461175265.66..63203382323457.28 rows=11491303 width=3845)
-> XN Merge Left Join DS_DIST_NONE (cost=4765778722374.23..62200689246924.30 rows=11491303 width=3321)
-> XN Hash Left Join DS_DIST_INNER (cost=4765778722374.23..62200689217006.26 rows=11491303 width=3285)
-> XN Merge Left Join DS_DIST_NONE (cost=3765767462785.01..36310404425364.76 rows=11491303 width=3284)
-> XN Hash Left Join DS_DIST_NONE (cost=3765767462785.01..36310403604540.56 rows=11491303 width=3276)
-> XN Hash Left Join DS_DIST_INNER (cost=3765766240852.13..36310400055617.20 rows=11491303 width=3251)
-> XN Hash Left Join DS_DIST_NONE (cost=1765764690131.17..34310372401253.97 rows=11491303 width=3250)
-> XN Hash Left Join DS_DIST_NONE (cost=1765764584059.78..34308739121313.73 rows=11491303 width=2734)
-> XN Hash Left Join DS_DIST_INNER (co
@kzzzr
kzzzr / _dbt_modules.md
Created November 24, 2021 09:03
DWH – DBT: Extending with modules

DBT: Extending with modules

Зачем нужен модуль?

  • Переиспользование кода в проектах: макросы, модели, тесты
  • Адаптация макросов и моделей под разные СУБД
  • Примеры модулей: utils, redshift, external_tables, facebook
@kzzzr
kzzzr / example.sql
Created November 19, 2021 07:33
Prepare WHERE condition dynamically – dbt macro
select
_id as audit_id
, resource_id
, driver_id
, "user_id"
, "event"
, created_at
, resource_type
@kzzzr
kzzzr / _assignment.md
Last active October 14, 2023 17:23
Metabase + Clickhouse tutorial on Yandex.Cloud
@kzzzr
kzzzr / assignment.md
Last active December 18, 2023 16:58
Data Vault 2.0 + Greenplum + dbtVault assignment
@kzzzr
kzzzr / de_bi_deployment.md
Last active September 21, 2021 16:34
Data Engineer – BI Deployment Assignment

1. Установить Metabase

Использовать виртуальную машину и Docker в Yandex.Cloud

2. Подключиться к источнику данных

Использовать любой доступный источник данных:

  • Jaffle shop (postgres) из занятия по dbt
  • Любой датасет из блока Data Lake