Skip to content

Instantly share code, notes, and snippets.

@gfranxman
gfranxman / parallel_execution_serial_return_demo.py
Last active April 30, 2024 17:12
Parallel execution of tasks with results in submitted order. Execution model for STT and TTS
import asyncio
import random
tasks = []
DEBUG=False
async def mock_event_generator():
"""
Mock event generator for parallel processing.
@gfranxman
gfranxman / prepare-commit-msg
Last active August 18, 2023 21:14
Git hook that uses llm to prepare commit messages as release notes.
#!/bin/sh
# https://gist.github.com/gfranxman/e9d4a523397535c6dd82d1445c246b8d/edit
# 2023-08-18
COMMIT_MSG_FILE=$1
COMMIT_SOURCE=$2
SHA1=$3
REL_NOTES_RAW=`git diff --staged | llm -s "release notes" 2>/dev/null`
REL_NOTES_RAW=$(echo "$REL_NOTES_RAW" | sed 's/^#/* /')
@gfranxman
gfranxman / README
Created July 12, 2023 13:40
Airflow: Fix for DAG not found in serialized_dag table
While rapidly starting and stoping and changing dags during development, you may run into errors look like this for one or more of the dags:
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:54,767] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:55,800] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:55,802] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:56,577] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-incremental' not found in serialized_dag table
dag_talk_examples-airflow-scheduler-1 | [2023-07-11 21:07:56,579] {scheduler_job.py:1063} ERROR - DAG 'zimmerman-garcia-and-henry-full' not found in serialized_dag table
This
@gfranxman
gfranxman / clone_objects_example.py
Created March 1, 2023 21:26
Cloning Django objects pattern
def model_to_dict(instance, exclude: list = None, modify: dict = None):
excluded_fields = ["id", "pk"]
if exclude:
excluded_fields.extend(exclude)
defaults = dict(
[
(fld.name, getattr(instance, fld.name))
for fld in instance._meta.fields
if fld.name not in excluded_fields
@gfranxman
gfranxman / gist:109f3e1df0916c155a6b0ce49c848a6a
Created June 17, 2022 19:24
skip all airflow catchup runs.
def abort_on_catchup(**context):
"""
This function determines whether to continue to the `next_task` or skip to 'end'
using the "next" schedule interval.
"""
# "Catchups" during this window are allowed.
# This is just to cover for late startingjobs.
@gfranxman
gfranxman / credset
Last active September 9, 2021 19:55
AWS cred juggling, credset command I've been using for years and awsenv which keeps everything off the filesystem and only in memory
#! /bin/bash
CRED=~/.aws/${1}.credentials
if [ -f $CRED ]
then
echo setting aws creds to $1
ln -f -s $CRED ~/.aws/credentials
else
echo sorry, choose from
@gfranxman
gfranxman / sec_policy_middleware.py
Last active February 12, 2021 15:41
POC suggestion for coarse grained view security policies -- BAST Pructise?
import re
from logging import getLogger
from django.conf import settings
from django.http.response import HttpResponseForbidden
logger = getLogger(__file__)
def is_authenticated(r):
@gfranxman
gfranxman / pyspark_tricks.py
Created October 16, 2020 17:46
Pyspark / DataBricks DataFrame size estimation
from pyspark.serializers import PickleSerializer, AutoBatchedSerializer
def _to_java_object_rdd(rdd):
""" Return a JavaRDD of Object by unpickling
It will convert each Python object into Java object by Pyrolite, whenever the
RDD is serialized in batc h or not.
"""
rdd = rdd._reserialize(AutoBatchedSerializer(PickleSerializer()))
return rdd.ctx._jvm.org.apache.spark.mllib.api.python.SerDe.pythonToJava(rdd._jrdd, True)
def estimate_df_size(df):
def request_as_curl(request):
"""
construct a curl command from a (failed) request.
"""
url = request.url
headers = request.headers
data = request.body.decode("utf-8")
method = request.method
command = "curl -v -H {headers} {data} -X {method} {uri}"
@gfranxman
gfranxman / safepath.py
Last active February 18, 2022 23:28
Python safepath to replace os.path.join when you don't want the path components to tmp outside a root path preventing path traversal.
def safepath_join(head, *tail):
"""
combines path parts like os.path.join, but ensures the resultant directory
doesn't step outside of the path given as the root.
"""
root = os.path.abspath(head)
p = os.path.normpath(os.path.join(head, *tail))
if not p.startswith(root + os.path.sep):
raise ValueError(f"{p} steps outside {root}")
return p