This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Inspired by: http://me.dt.in.th/page/Quicksort/ | |
def quicksort(l): | |
if len(l) < 2: | |
return l | |
iSwap = 1 | |
pivot = l[0] # left most value is the pivot | |
for i, val in enumerate(l[1:], start=1): # Skip the pivot cell | |
if val < pivot: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Polls a file hosted at a URL and downloads it initially and if it changes.""" | |
# Should be fairly robust to web server issues (in fact, it would only | |
# be a handful of lines were it not for error handling) | |
import requests | |
import time | |
import sys | |
FILE_URL = "http://<mywebserver>/<myfile>" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# If you want to run Beeline without installing Hive... | |
# This will help you find the jars that you need: | |
# Ref: https://pvillaflores.wordpress.com/2017/04/30/installing-and-running-beeline-client/ | |
# Turn on verbose classloading | |
$ export _JAVA_OPTIONS=-verbose:class | |
# Run beeline and process out the needed jars. | |
# Below assumes the hadoop jars are under a 'cloudera' path (adjust accordingly) | |
$ /usr/bin/beeline | tr '[' '\n' | tr ']' ' ' | grep jar | grep cloudera | grep -v checksum | awk '{last=split($0,a,"/"); print a[last]}' | sort | uniq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### There aren't many good examples of how to do this when also using kerberos | |
(venv) [airflow@cray01 dags]$ airflow connections --add \ | |
--conn_id beeline_hive \ | |
--conn_type 'beeline' \ | |
--conn_host 'myserver.mydomain.com' \ | |
--conn_port 10000 \ | |
--conn_extra '{"use_beeline": true, "auth":"kerberos;principal=mysvcname/myservicehost@MYDOMAIN.COM;"}' | |
### Then, a sample DAG to use it |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Let's say this is your kerberos ticket (likely from a keytab used for the remote service): | |
Ticket cache: FILE:/tmp/airflow_krb5_ccache | |
Default principal: hive/myserver.myrealm@myrealm | |
Valid starting Expires Service principal | |
06/14/2018 17:52:05 06/15/2018 17:49:35 krbtgt/myrealm@myrealm | |
renew until 06/17/2018 05:49:33 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from airflow.models import DagRun | |
from sqlalchemy import * | |
from airflow import settings | |
session = settings.Session() | |
session.query(DagRun).delete() | |
session.commit() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from datetime import datetime, timedelta | |
def round_minutes(dt, resolutionInMinutes): | |
"""round_minutes(datetime, resolutionInMinutes) => datetime rounded to lower interval | |
Works for minute resolution up to a day (e.g. cannot round to nearest week). | |
""" | |
# First zero out seconds and micros | |
dtTrunc = dt.replace(second=0, microsecond=0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
If you are trying to run a script like this | |
import sh | |
myScriptCommand = sh.Command("/path/to/script") | |
myScriptCommand("my arg") | |
and you see this error: | |
sh.ErrorReturnCode_255 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
>>> def genX(): | |
... for i in range(3): | |
... yield i | |
... | |
>>> for i in genX(): print(i) | |
... | |
0 | |
1 | |
2 | |
>>> def genY(): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pyarrow | |
import os | |
import sh | |
# Get obscure error without this: pyarrow.lib.ArrowIOError: HDFS list directory failed, errno: 2 (No such file or directory) | |
os.environ['CLASSPATH'] = str(sh.hadoop('classpath','--glob')) | |
# Not needed | |
#os.environ['HADOOP_HOME'] = '/opt/cloudera/parcels/CDH-<your version>/' |
OlderNewer