Skip to content

Instantly share code, notes, and snippets.

@cupdike
cupdike / DeleteAllDagruns.py
Created September 20, 2018 16:07
Use Airflow's ORM to delete all DagRuns. Could also use sqlalchemy filtering if desired. This was with Airflow 1.8.
from airflow.models import DagRun
from sqlalchemy import *
from airflow import settings
session = settings.Session()
session.query(DagRun).delete()
session.commit()
@cupdike
cupdike / ConnectionSetup.txt
Last active August 19, 2020 16:05
Airflow Connection to Remote Kerberized Hive Metastore
# Let's say this is your kerberos ticket (likely from a keytab used for the remote service):
Ticket cache: FILE:/tmp/airflow_krb5_ccache
Default principal: hive/myserver.myrealm@myrealm
Valid starting Expires Service principal
06/14/2018 17:52:05 06/15/2018 17:49:35 krbtgt/myrealm@myrealm
renew until 06/17/2018 05:49:33
@cupdike
cupdike / AirflowBeelineConnectionSample
Created June 13, 2018 16:39
Airflow Beeline Connection Using Kerberos via CLI
### There aren't many good examples of how to do this when also using kerberos
(venv) [airflow@cray01 dags]$ airflow connections --add \
--conn_id beeline_hive \
--conn_type 'beeline' \
--conn_host 'myserver.mydomain.com' \
--conn_port 10000 \
--conn_extra '{"use_beeline": true, "auth":"kerberos;principal=mysvcname/myservicehost@MYDOMAIN.COM;"}'
### Then, a sample DAG to use it
@cupdike
cupdike / BeelineJarDependencyFinder
Created July 12, 2017 20:12
Bash commands that will provide the list of jars needed to run beeline without installing hive
# If you want to run Beeline without installing Hive...
# This will help you find the jars that you need:
# Ref: https://pvillaflores.wordpress.com/2017/04/30/installing-and-running-beeline-client/
# Turn on verbose classloading
$ export _JAVA_OPTIONS=-verbose:class
# Run beeline and process out the needed jars.
# Below assumes the hadoop jars are under a 'cloudera' path (adjust accordingly)
$ /usr/bin/beeline | tr '[' '\n' | tr ']' ' ' | grep jar | grep cloudera | grep -v checksum | awk '{last=split($0,a,"/"); print a[last]}' | sort | uniq
@cupdike
cupdike / PollingFileDownloader.py
Created October 6, 2015 17:45
Polls a file hosted at a URL and downloads it initially and if it changes.
"""Polls a file hosted at a URL and downloads it initially and if it changes."""
# Should be fairly robust to web server issues (in fact, it would only
# be a handful of lines were it not for error handling)
import requests
import time
import sys
FILE_URL = "http://<mywebserver>/<myfile>"
@cupdike
cupdike / quicksort.py
Last active October 6, 2015 17:46
Basic quicksort impl, inspired by http://me.dt.in.th/page/Quicksort/
# Inspired by: http://me.dt.in.th/page/Quicksort/
def quicksort(l):
if len(l) < 2:
return l
iSwap = 1
pivot = l[0] # left most value is the pivot
for i, val in enumerate(l[1:], start=1): # Skip the pivot cell
if val < pivot: