Skip to content

Instantly share code, notes, and snippets.

@samklr
samklr / INSTALL.md
Created Oct 10, 2021 — forked from jpillora/INSTALL.md
Headless Transmission on Mac OS X
View INSTALL.md
  1. Go to https://developer.apple.com/downloads/index.action and search for "Command line tools" and choose the one for your Mac OSX

  2. Go to http://brew.sh/ and enter the one-liner into the Terminal, you now have brew installed (a better Mac ports)

  3. Install transmission-daemon with

    brew install transmission
    
  4. Copy the startup config for launchctl with

    ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents
    
View minio-dremio-compose.yml
version: '2'
services:
minio:
restart: always
image: docker.io/bitnami/minio:2021
ports:
- '9000:9000'
environment:
- MINIO_ROOT_USER=miniokey
View jinja-example-airflow-part-2.py
# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day.
file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}"
bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix)
file_sensor = S3KeySensor(
 task_id='s3_key_sensor_task',
 poke_interval=60 * 30, # (seconds); checking file every half an hour
 timeout=60 * 60 * 12, # timeout in 12 hours
 bucket_key=bucket_key_template,
 bucket_name=None,
 wildcard_match=False,
@samklr
samklr / s3_sensor.py
Created Feb 12, 2021 — forked from msumit/s3_sensor.py
Airflow file sensor example
View s3_sensor.py
from airflow import DAG
from airflow.operators.sensors import S3KeySensor
from airflow.operators import BashOperator
from datetime import datetime, timedelta
yday = datetime.combine(datetime.today() - timedelta(1),
datetime.min.time())
default_args = {
'owner': 'msumit',
View airflow_arena:arflow_on_container:layout_generator.py
with DAG(**dag_config) as dag:
# Declare pipeline start and end task
start_task = DummyOperator(task_id='pipeline_start')
end_task = DummyOperator(task_id='pipeline_end')
for account_details in pipeline_config['task_details']['accounts']:
#Declare Account Start and End Task
if account_details['runable']:
acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start')
acct_start_task.set_upstream(start_task)
@samklr
samklr / parquet_tools.md
Last active Jan 22, 2021
Parquet Tools
View parquet_tools.md

Setup Parquet-tools brew install parquet-tools

Help parquet-tools -h

parquet-tools rowcount part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

parquet-tools head -n 1 part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

parquet-tools meta part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet

View gist:743d927dd0a5f5671c64b1d346e7b318

Context The Integration team has deployed a cron job to dump a CSV file containing all the new Shopify configurations daily at 2 AM UTC. The task will be to build a daily pipeline that will :

download the CSV file from https://alg-data-public.s3.amazonaws.com/[YYYY-MM-DD].csv, filter out each row with empty application_id, add a has_specific_prefix column set to true if the value of index_prefix differs from shopify_ else to false load the valid rows to a Postresql instance The pipeline should process files from 2019-04-01 to 2019-04-07.

View kamon.conf
play.modules.enabled += "com.samklr.KamonModule"
kamon {
environment {
service = "my-svc"
}
jaeger {
View golang_job_queue.md
View StreamsDSLAndProcessorExample.java
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*