Skip to content

Instantly share code, notes, and snippets.

View timvw's full-sized avatar

Tim Van Wassenhove timvw

View GitHub Profile
@timvw
timvw / Dockerfile
Last active May 12, 2023 10:32
DBT on EMR/EKS with spark
FROM maven:3.6.3-jdk-8 as builder
RUN mvn dependency:copy -Dartifact=org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 -DoutputDirectory=/opt/jars
RUN mvn dependency:copy -Dartifact=io.delta:delta-core_2.12:2.2.0 -DoutputDirectory=/opt/jars
RUN mvn dependency:copy -Dartifact=io.delta:delta-storage:2.2.0 -DoutputDirectory=/opt/jars
FROM 107292555468.dkr.ecr.eu-central-1.amazonaws.com/spark/emr-6.9.0:latest
COPY --from=builder /opt/jars/hudi-spark3.3-bundle_2.12-0.13.0.jar /usr/lib/spark/jars
COPY --from=builder /opt/jars/delta-core_2.12-2.2.0.jar /usr/lib/spark/jars
COPY --from=builder /opt/jars/delta-storage-2.2.0.jar /usr/lib/spark/jars
@timvw
timvw / migrate.sh
Created January 17, 2023 15:21
Copy and update all libs from /usr/local to current directory
for file in $(ls .); do
for ll in $(otool -L $file | grep -oE "/usr/local/[^ ]*"); do
echo "copying $ll to ."
sudo cp $ll .
#echo ""
done
done
for file in $(ls .); do
for ll in $(otool -L $file | grep -oE "/usr/local/[^ ]*"); do
@timvw
timvw / cli.sh
Last active January 15, 2023 09:26
Ballista session
# launch cli and connect to scheduler
docker run \
--network host \
--rm -it \
-v /Users/timvw:/Users/timvw \
apache/arrow-ballista-cli:0.10.0 --host 127.0.0.1 --port 50050
create external table test stored as parquet location '/Users/timvw/Desktop/test.parquet';
select * from test limit 10;
@timvw
timvw / script.sh
Created November 23, 2022 10:04
From json to tsv with jq
```bash
cat LSQL-messages-2022-11-23-06-44-47.json | jq -r '.[].value | [.day, .count] | @tsv'
```
@timvw
timvw / port-user.sh
Created May 5, 2022 11:51
Describe which process is using a given TCP port (macos)
lsof -nP -i4TCP:$1
@timvw
timvw / Cargo.toml
Created May 3, 2022 20:58
datafusion-aws-glue-catalog (example)
[package]
name = "rust-playground"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
aws-config = "0.11.0"
aws-sdk-s3 = "0.11.0"
@timvw
timvw / deployment-controller-token
Last active January 21, 2022 10:20
Get token for k8s dashboard http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/#/login
kubectl -n kube-system get secret -o json | \
jq -r '.items[] | select(.metadata.name | startswith("deployment-controller-token")) | .data.token' | \
base64 --decode | \
pbcopy
@timvw
timvw / ratelimited-cats.scala
Created November 16, 2021 17:02
Ratelimit with cats-effect
package be.icteam.sample
import cats.effect._
import cats.implicits._
import cats.effect.concurrent.Semaphore
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
@timvw
timvw / master2main.sh
Created January 28, 2021 09:28
Deal with git repos where master has been renamed to main
git branch -m master main && \
git fetch origin && \
git branch -u origin/main main && \
git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main
@timvw
timvw / terraform.yml
Created January 20, 2021 05:54
github actions to plan/apply multiple terraform projects
name: "Terraform"
on:
push:
branches:
- master
pull_request:
jobs:
terraform: