Skip to content

Instantly share code, notes, and snippets.

View vepetkov's full-sized avatar

V. Petkov vepetkov

  • Munich, Germany
View GitHub Profile
@vepetkov
vepetkov / hdfs_pq_access.py
Created September 4, 2018 11:10
Python HDFS + Parquet (hdfs3, PyArrow + libhdfs, HdfsCLI + Knox)
##################################################################
## Native hdfs access (only on the cluster)
# conda install -c conda-forge libhdfs3=2.3.0=1 hdfs3 --yes
import hdfs3
import pandas as pd
nameNodeHost = 'hadoopnn1.localdomain'
nameNodeIPCPort = 8020
hdfs = hdfs3.HDFileSystem(nameNodeHost, port=nameNodeIPCPort)
@vepetkov
vepetkov / direnvrc
Created July 14, 2023 09:14
Load .venv automatically using DirEnv
# Store in ~/.config/direnv/direnvrc to run for all folders automatically
# check if VENV is loaded
if [[ -z "${VIRTUAL_ENV_PROMPT}" ]] ; then
if [ ! -d ".venv" ] ; then
echo "Installing virtualenv for $(python -V)"
python -m venv .venv
fi
echo "Activating $(python -V) virtualenv from .venv"
source .venv/bin/activate
fi
@vepetkov
vepetkov / snowflake_upload_local.py
Created December 20, 2019 12:56
Snowflake Upload Local Files from Python
import os
import snowflake.connector
ctx = snowflake.connector.connect(
authenticator="snowflake",
user=os.getenv("SNOWSQL_USER"),
password=os.getenv("SNOWSQL_PWD"),
account=os.getenv("SNOWSQL_ACCOUNT"),
warehouse=os.getenv("SNOWSQL_WAREHOUSE")
)
# Parse the whole git history and show files larger than 1Mb (2^20 b)
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
awk '$2 >= 2^20' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest