Skip to content

Instantly share code, notes, and snippets.

View nervuzz's full-sized avatar

Bogdan nervuzz

View GitHub Profile
@nervuzz
nervuzz / check_parquet_schema.md
Created February 16, 2022 20:38
Check parquet file schema

How to check parquet file schema

  • install pyarrow
import pyarrow.parquet as pq

fhv_2019_01 = r"/home/nervuzz/tmp/fhv_tripdata_2019-01.parquet"
fhv_2019_12 = r"/home/nervuzz/tmp/fhv_tripdata_2019-12.parquet"
@nervuzz
nervuzz / pgcli_issue.md
Created February 6, 2022 16:22
[DE Zoomcap] Troubleshooting pgcli issues

-- Read about DataTalks.Club Data Engineering Zoomcamp --

How to fix errors appearing after running pgcli

First week of the data engineering Zoomcamp by DataTalks.Club was a gentle introduction to writing and executing SQL queries against the PostgreSQL database.

Intro

Although pgcli is not 100% necessary for the Week 1 completion, but it makes writing SQL queries more pleasant, so I decided to try reproducing errors that MAY appear while installing pgcli and / or using it.

I've set-up my env using WSL2 (Ubuntu-20.04) in Windows 10 (21H1), Python 3.9.10.

@nervuzz
nervuzz / happy_airflow_wsl2.md
Last active February 8, 2022 06:06
[DE Zoomcap] Airflow & WSL2: no-frills or no-thrills

-- Read about DataTalks.Club Data Engineering Zoomcamp --

Airflow & WSL2: no-frills or no-thrills

Second week of the data engineering Zoomcamp by DataTalks.Club brought a new tool that is one of the most popular data pipeline platforms - Apache Airflow. So we are going to create some workflows!

Intro

First you have to run the Docker compose Airflow installation in the environment of our choice, which can be one of but not limited to MacOS, Linux, GCP VM or very popular WSL. What's more, we also need the Google Cloud SDK installed in our Airflow env in order to connect with the Cloud Store bucket & create tables in Big Query. That means we cannot just use the official docker-compose.yaml referenced in the Airflow's docs, but we have to build custom Dockerfile with an extended apache/airflow image containing our additional dependencies. Then we can incorporate it into docker-compose.yaml 🙌