Skip to content

Instantly share code, notes, and snippets.

cupdike

Block or report user

Report or block cupdike

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@cupdike
cupdike / shErrorCode255Tip.txt
Created Mar 27, 2019
sh.ErrorReturnCode_255 using Python sh package
View shErrorCode255Tip.txt
If you are trying to run a script like this
import sh
myScriptCommand = sh.Command("/path/to/script")
myScriptCommand("my arg")
and you see this error:
sh.ErrorReturnCode_255
@cupdike
cupdike / gist:c5554233e1dd6b233a9b6ec6adb05c5a
Created Nov 1, 2018
Python function to round down minutes to a user specified resolution
View gist:c5554233e1dd6b233a9b6ec6adb05c5a
from datetime import datetime, timedelta
def round_minutes(dt, resolutionInMinutes):
"""round_minutes(datetime, resolutionInMinutes) => datetime rounded to lower interval
Works for minute resolution up to a day (e.g. cannot round to nearest week).
"""
# First zero out seconds and micros
dtTrunc = dt.replace(second=0, microsecond=0)
@cupdike
cupdike / DeleteAllDagruns.py
Created Sep 20, 2018
Use Airflow's ORM to delete all DagRuns. Could also use sqlalchemy filtering if desired. This was with Airflow 1.8.
View DeleteAllDagruns.py
from airflow.models import DagRun
from sqlalchemy import *
from airflow import settings
session = settings.Session()
session.query(DagRun).delete()
session.commit()
@cupdike
cupdike / ConnectionSetup.txt
Last active Jun 14, 2018
Airflow Connection to Remote Kerberized Hive Metastore
View ConnectionSetup.txt
# Let's say this is your kerberos ticket (likely from a keytab used for the remote service):
Ticket cache: FILE:/tmp/airflow_krb5_ccache
Default principal: hive/myserver.myrealm@myrealm
Valid starting Expires Service principal
06/14/2018 17:52:05 06/15/2018 17:49:35 krbtgt/myrealm@myrealm
renew until 06/17/2018 05:49:33
@cupdike
cupdike / AirflowBeelineConnectionSample
Created Jun 13, 2018
Airflow Beeline Connection Using Kerberos via CLI
View AirflowBeelineConnectionSample
### There aren't many good examples of how to do this when also using kerberos
(venv) [airflow@cray01 dags]$ airflow connections --add \
--conn_id beeline_hive \
--conn_type 'beeline' \
--conn_host 'myserver.mydomain.com' \
--conn_port 10000 \
--conn_extra '{"use_beeline": true, "auth":"kerberos;principal=mysvcname/myservicehost@MYDOMAIN.COM;"}'
### Then, a sample DAG to use it
@cupdike
cupdike / BeelineJarDependencyFinder
Created Jul 12, 2017
Bash commands that will provide the list of jars needed to run beeline without installing hive
View BeelineJarDependencyFinder
# If you want to run Beeline without installing Hive...
# This will help you find the jars that you need:
# Ref: https://pvillaflores.wordpress.com/2017/04/30/installing-and-running-beeline-client/
# Turn on verbose classloading
$ export _JAVA_OPTIONS=-verbose:class
# Run beeline and process out the needed jars.
# Below assumes the hadoop jars are under a 'cloudera' path (adjust accordingly)
$ /usr/bin/beeline | tr '[' '\n' | tr ']' ' ' | grep jar | grep cloudera | grep -v checksum | awk '{last=split($0,a,"/"); print a[last]}' | sort | uniq
@cupdike
cupdike / PollingFileDownloader.py
Created Oct 6, 2015
Polls a file hosted at a URL and downloads it initially and if it changes.
View PollingFileDownloader.py
"""Polls a file hosted at a URL and downloads it initially and if it changes."""
# Should be fairly robust to web server issues (in fact, it would only
# be a handful of lines were it not for error handling)
import requests
import time
import sys
FILE_URL = "http://<mywebserver>/<myfile>"
View quicksort.py
# Inspired by: http://me.dt.in.th/page/Quicksort/
def quicksort(l):
if len(l) < 2:
return l
iSwap = 1
pivot = l[0] # left most value is the pivot
for i, val in enumerate(l[1:], start=1): # Skip the pivot cell
if val < pivot:
You can’t perform that action at this time.