Sam Bessalah samklr

## INSTALL.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                samklr
                / INSTALL.md
            
            
              Created
              October 10, 2021 12:15
                — forked from jpillora/INSTALL.md
            
              
                Headless Transmission on Mac OS X
              
          
Go to https://developer.apple.com/downloads/index.action and search for "Command line tools" and choose the one for your Mac OSX


Go to http://brew.sh/ and enter the one-liner into the Terminal, you now have brew installed (a better Mac ports)


Install transmission-daemon with
brew install transmission


Copy the startup config for launchctl with
ln -sfv /usr/local/opt/transmission/*.plist ~/Library/LaunchAgents


## minio-dremio-compose.yml
version: '2'

services:
  minio:
    restart: always
    image: docker.io/bitnami/minio:2021
    ports:
      - '9000:9000'
    environment:
      - MINIO_ROOT_USER=miniokey

## jinja-example-airflow-part-2.py
# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day.
file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}"
bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix)
file_sensor = S3KeySensor(
 task_id='s3_key_sensor_task',
 poke_interval=60 * 30, # (seconds); checking file every half an hour
 timeout=60 * 60 * 12, # timeout in 12 hours
 bucket_key=bucket_key_template,
 bucket_name=None,
 wildcard_match=False,

## s3_sensor.py
from airflow import DAG
from airflow.operators.sensors import S3KeySensor
from airflow.operators import BashOperator
from datetime import datetime, timedelta

yday = datetime.combine(datetime.today() - timedelta(1),
                                  datetime.min.time())

default_args = {
    'owner': 'msumit',

## airflow_arena:arflow_on_container:layout_generator.py
with DAG(**dag_config) as dag:
    # Declare pipeline start and end task
    start_task = DummyOperator(task_id='pipeline_start')
    end_task = DummyOperator(task_id='pipeline_end')

    for account_details in pipeline_config['task_details']['accounts']:
        #Declare Account Start and End Task
        if account_details['runable']:
            acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start')
            acct_start_task.set_upstream(start_task)

## parquet_tools.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                samklr
                / parquet_tools.md
            
            
              Last active
              January 22, 2021 19:50
            
              
                Parquet Tools
              
          
    Setup Parquet-tools
brew install parquet-tools
Help parquet-tools -h
parquet-tools rowcount part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet 
parquet-tools head -n 1 part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet 
parquet-tools meta part-00000-fc34f237-c985-4ebc-822b-87fa446f6f70.c000.snappy.parquet 

  
## gist:743d927dd0a5f5671c64b1d346e7b318

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                samklr
                / gist:743d927dd0a5f5671c64b1d346e7b318
            
            
              Created
              November 26, 2020 21:00
            
              
                Data Engineering assignement
              
          
    Context
The Integration team has deployed a cron job to dump a CSV file containing all the new Shopify configurations daily at 2 AM UTC.
The task will be to build a daily pipeline that will :
download the CSV file from https://alg-data-public.s3.amazonaws.com/[YYYY-MM-DD].csv,
filter out each row with empty application_id,
add a has_specific_prefix column set to true if the value of index_prefix differs from shopify_ else to false
load the valid rows to a Postresql instance
The pipeline should process files from 2019-04-01 to 2019-04-07.

  
## kamon.conf
play.modules.enabled += "com.samklr.KamonModule"


kamon {
  environment {
    service = "my-svc"
  }

  jaeger {

## golang_job_queue.md

      
              4 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                samklr
                / golang_job_queue.md
            
            
              Created
              November 9, 2019 10:02
                — forked from harlow/golang_job_queue.md
            
              
                Job queues in Golang
              
          
    Golang Workers / Job Queue

A running example of the code from:

http://marcio.io/2015/07/handling-1-million-requests-per-minute-with-golang
http://nesv.github.io/golang/2014/02/25/worker-queues-in-go.html

This gist creates a working example from blog post, and a alternate example using simple worker pool.
TLDR: if you want simple and controlled concurrency use a worker pool.

  
## StreamsDSLAndProcessorExample.java
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License. You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
	version: '2'

	services:
	minio:
	restart: always
	image: docker.io/bitnami/minio:2021
	ports:
	- '9000:9000'
	environment:
	- MINIO_ROOT_USER=miniokey
	# suppose my data file name has the following format "datatfile_YYYY_MM_DD.csv"; this file arrives in S3 every day.
	file_suffix = "{{ execution_date.strftime('%Y-%m-%d') }}"
	bucket_key_template = 's3://[bucket_name]/datatfile_{}.csv'.format(file_suffix)
	file_sensor = S3KeySensor(
	task_id='s3_key_sensor_task',
	poke_interval=60 * 30, # (seconds); checking file every half an hour
	timeout=60 * 60 * 12, # timeout in 12 hours
	bucket_key=bucket_key_template,
	bucket_name=None,
	wildcard_match=False,
	from airflow import DAG
	from airflow.operators.sensors import S3KeySensor
	from airflow.operators import BashOperator
	from datetime import datetime, timedelta

	yday = datetime.combine(datetime.today() - timedelta(1),
	datetime.min.time())

	default_args = {
	'owner': 'msumit',
	with DAG(**dag_config) as dag:
	# Declare pipeline start and end task
	start_task = DummyOperator(task_id='pipeline_start')
	end_task = DummyOperator(task_id='pipeline_end')

	for account_details in pipeline_config['task_details']['accounts']:
	#Declare Account Start and End Task
	if account_details['runable']:
	acct_start_task = DummyOperator(task_id=account_details['account_id'] + '_start')
	acct_start_task.set_upstream(start_task)
	play.modules.enabled += "com.samklr.KamonModule"


	kamon {
	environment {
	service = "my-svc"
	}

	jaeger {
	/*
	* Licensed to the Apache Software Foundation (ASF) under one or more
	* contributor license agreements. See the NOTICE file distributed with
	* this work for additional information regarding copyright ownership.
	* The ASF licenses this file to You under the Apache License, Version 2.0
	* (the "License"); you may not use this file except in compliance with
	* the License. You may obtain a copy of the License at
	*
	* http://www.apache.org/licenses/LICENSE-2.0
	*