Skip to content

Instantly share code, notes, and snippets.

samklr /
Created October 10, 2021 12:15 — forked from jpillora/
Headless Transmission on Mac OS X
  1. Go to and search for "Command line tools" and choose the one for your Mac OSX

  2. Go to and enter the one-liner into the Terminal, you now have brew installed (a better Mac ports)

  3. Install transmission-daemon with

    brew install transmission
  4. Copy the startup config for launchctl with

    ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents
View minio-dremio-compose.yml
version: '2'
restart: always
- '9000:9000'
- MINIO_ROOT_USER=miniokey
# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day.
file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}"
bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix)
file_sensor = S3KeySensor(
 poke_interval=60 * 30, # (seconds); checking file every half an hour
 timeout=60 * 60 * 12, # timeout in 12 hours
samklr /
Created February 12, 2021 20:03 — forked from msumit/
Airflow file sensor example
from airflow import DAG
from airflow.operators.sensors import S3KeySensor
from airflow.operators import BashOperator
from datetime import datetime, timedelta
yday = datetime.combine( - timedelta(1),
default_args = {
'owner': 'msumit',
with DAG(**dag_config) as dag:
# Declare pipeline start and end task
start_task = DummyOperator(task_id='pipeline_start')
end_task = DummyOperator(task_id='pipeline_end')
for account_details in pipeline_config['task_details']['accounts']:
#Declare Account Start and End Task
if account_details['runable']:
acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start')
samklr /
Last active January 22, 2021 19:50
Parquet Tools

Setup Parquet-tools brew install parquet-tools

Help parquet-tools -h

parquet-tools rowcount part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

parquet-tools head -n 1 part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

parquet-tools meta part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

samklr / gist:743d927dd0a5f5671c64b1d346e7b318
Created November 26, 2020 21:00
Data Engineering assignement
View gist:743d927dd0a5f5671c64b1d346e7b318

Context The Integration team has deployed a cron job to dump a CSV file containing all the new Shopify configurations daily at 2 AM UTC. The task will be to build a daily pipeline that will :

download the CSV file from[YYYY-MM-DD].csv, filter out each row with empty application_id, add a has_specific_prefix column set to true if the value of index_prefix differs from shopify_ else to false load the valid rows to a Postresql instance The pipeline should process files from 2019-04-01 to 2019-04-07.

View kamon.conf
play.modules.enabled += "com.samklr.KamonModule"
kamon {
environment {
service = "my-svc"
jaeger {
samklr /
Created November 9, 2019 10:02 — forked from harlow/
Job queues in Golang
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at