Skip to content

Instantly share code, notes, and snippets.

@dstandish
dstandish / synthesize_data.py
Created June 13, 2024 21:59
task instance data generator
"""
Airflow task instance data generator.
Takes two arguments:
* unit: days / weeks / hours / minutes etc
* num: just to do the same thing with different name dag
Example:
python /Users/dstandish/code/async-ssh-operator/synthesize_data.py days 1
* This will create one dag run per day for the time period. (hardcoded at 2.5 yrs)
from __future__ import annotations
from aocd import lines, numbers
class Node:
def __init__(self, name, type, size=0):
self.name = name
self.type = type
self.size = size
@dstandish
dstandish / extract_lines_from_large_gzip.sh
Last active January 18, 2020 22:15
get specific lines from large compressed file with nonstandard row separator
a=$'\001'
b=$'\002'
c=$'\003'
d=$'\004'
ft="$a$b$c$c" # odd field separator
rt="$a$b$c$d" # odd row separator
zcat very_large_file.csv.gz | awk -v rt=$rt 'BEGIN { RS = rt } NR>=946241736&&NR<=946241740' > out.txt
@dstandish
dstandish / happy_git_on_mac_os.md
Last active July 24, 2018 18:14 — forked from trey/happy_git_on_osx.md
Creating a Happy Git Environment on macOS

Creating a Happy Git Environment on macOS

Install git and bash-completion on your machine

Install Git

brew install git bash-completion

Basic config:

@dstandish
dstandish / WrappedStreamingBody
Created April 17, 2018 06:47 — forked from debedb/WrappedStreamingBody
Wrap boto3's StreamingBody object to provide enough Python fileobj functionality fileobj functionality so that GzipFile is satisfied.
class WrappedStreamingBody:
"""
Wrap boto3's StreamingBody object to provide enough
fileobj functionality so that GzipFile is
satisfied. Sometimes duck typing is awesome.
@dstandish
dstandish / s3_client_wrapper.py
Last active April 14, 2018 21:56
python s3 client wrapper to simplify list, delete, copy, and upload operations; example extending boto3
"""
This module provides a boto3 s3 client factory get_client(), which returns an s3 client that has been augmented by some
additional functionality defined in the ClientWrap class, also present in this module.
ClientWrap adds a few wrapper methods that simplify simple list / delete / copy operations by (1) handling paging and
batching and (2) dealing only with keys instead of more detailed object metadata.
get_client() also makes it easy to to specify a default bucket for the client, so that you don't need to specify the
bucket in each call.