Skip to content

Instantly share code, notes, and snippets.

View Menziess's full-sized avatar

Stefan Schenk Menziess

View GitHub Profile
@Menziess
Menziess / flatten_structs.py
Created March 19, 2024 12:13
Spark flatten nested structures
from pyspark.sql.functions import *
def flatten_structs(df):
"""Omits lists, and flattens structs into regular columns.
>>> flatten_structs(test_df).show() # doctest: +NORMALIZE_WHITESPACE
Omitted column rootstructype.nestedstructtype
Omitted column arraytype
+---+--------+---------+------------------+------------------+------------------+
@Menziess
Menziess / row_exists_in_group.py
Last active March 19, 2024 12:18
Spark check row exists in group using windowing
from pyspark.sql.functions import *
from pyspark.sql.window import Window
from pyspark.sql import Column
from functools import reduce
w = Window.partitionBy('group_id')
filter_expression = reduce(Column.__and__, (
@Menziess
Menziess / camelcase.py
Created October 18, 2023 12:16
Convert input string to snake-, camel-, kebab- or pascalcase.
from re import findall
from toolz.curried import curry, map, pipe
def to_something_case(text: str, case='snake') -> str:
"""Convert input string to snake-, camel-, kebab- or pascalcase."""
if not text:
return text
first_char, sep, other_first_chars = {
'snake': (str.lower, '_', str.lower),
'camel': (str.lower, '', str.title),
@Menziess
Menziess / rocksdb.py
Last active November 24, 2023 11:22
Rocksdb using the rocksdict python package, cleaning/pruning old data via TTL or FIFO compaction
import os
import random
import string
from os import cpu_count
from pprint import pprint
from time import sleep
from rocksdict import (AccessType, DBCompactionStyle, DBCompressionType,
FifoCompactOptions, Options, Rdict, ReadOptions)
function install_tar {
s=$1 # Used for string substitution
if [[ "$1" == *.tar.gz ]]
then
tar -xvzf "$1" | sudo xargs mv -t /usr/local/bin/
elif [[ "$1" == *.tar ]]
then
tar -xvf "$1" | sudo xargs mv -t /usr/local/bin/
else
echo Must have extension: *.tar.gz / *.tar.
@Menziess
Menziess / gist:3c1f99fdfe816af0382bbe4371ca5575
Created October 26, 2021 13:53
Substitute environment variable names with environment variable values in bash/shell (linux)
➜ export LOL=lol
➜ echo "Hello there, ${LOL}" > example.txt
➜ envsubst < example.txt
Hello there, lol
@Menziess
Menziess / Quotes.md
Last active July 17, 2020 12:54
Interesting sayings.

"Most of the Evil in This World Is Done by People with Good Intentions." Ayn Rand "The road to hell is paved with good intentions."

"Power corrupts." The founding fathers

"Those who would give up essential liberty, to purchase a little temporary safety, deserve neither liberty nor safety." Benjamin Franklin (1706-1790)

“If the freedom of speech is taken away then dumb and silent we may be led, like sheep to the slaughter.” George Washington

"I disapprove of what you say, but I will defend to the death your right to say it." Voltaire

@Menziess
Menziess / Makefile
Created June 5, 2020 19:07
Install specific Hugo version script.
HUGO_VERSION=0.68.3
run:
export HUGO_VERSION=$(HUGO_VERSION)
hugo server -FD
install-hugo:
# https://api.github.com/repos/gohugoio/hugo/releases
rm -f hugo*_Linux-64bit.deb
curl -s url -s https://api.github.com/repos/gohugoio/hugo/releases \
@Menziess
Menziess / cdc.py
Created March 29, 2020 20:12
Change Data Capture
from pyspark.sql import DataFrame, Window
def change_data_capture(
df: DataFrame,
partition_col,
valid_from_col,
valid_to_col,
capture_columns=[]
):
def deep_ls(path: str, max_depth=1, reverse=False, key=None, keep_hidden=False):
"""List all files in base path recursively.
List all files and folders in specified path and subfolders within maximum recursion depth.
Parameters
----------
path : str
The path of the folder from which files are listed
max_depth : int