Skip to content

Instantly share code, notes, and snippets.

View vitalibertas's full-sized avatar

Cory Brickner vitalibertas

  • Virginia
  • 02:42 (UTC -04:00)
View GitHub Profile
@vitalibertas
vitalibertas / gist:732c0b2a251f480c287ca6418ab1be65
Created August 13, 2017 03:45
Bash script being polite during work hours. Does hour math to sleep until 5:00 pm.
if [ $(date +"%-H") -ge 4 ] && [ $(date +"%-H") -le 17 ]; then
sleep $(((17 - $(date +"%-H")) * 60))m
else
sleep 5m
fi
@vitalibertas
vitalibertas / gist:d17bba1219ed22e42f6b018608b85b96
Created August 13, 2017 03:51
Check HDFS for a specific file a certain amount of times before it errors out so you don't execute code that has a dependency.
CHECK_HDFS="/some/path/to/file"
function hdfsCheck {
RETRY=0
while [ $RETRY -lt 9 ];
do
COUNT=$(hdfs dfs -ls "${CHECK_HDFS}" | wc -l) 2> stderr.txt
if [ $COUNT -lt 1 ]; then
@vitalibertas
vitalibertas / gist:3653fcf459647ca533dc81e8edf69dd5
Last active August 13, 2017 04:10
Hive query that uses arrays, aggregates, and windowing to determine customer onboarding category.
WITH Landing AS (
SELECT
visit_id
,COLLECT_SET(shopper_id) AS shopper_array
,MIN(sequence) AS min_sequence
FROM
visits
WHERE
page_type = 'landing'
GROUP BY
@vitalibertas
vitalibertas / gist:4eff16e088aca0122d8c167c7977c4c4
Created September 28, 2017 19:27
Hive row_number() in place of aggregate to determine the maximum event time for each ID per day.
SET hive.execution.engine = mr;
SET hive.support.concurrency = false;
SET hive.exec.parallel = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
USE hosting_stats;
WITH Rank AS (
SELECT
cid
@vitalibertas
vitalibertas / gist:b16ed8a13d7d2d0516ef2d1b57b60402
Last active August 30, 2018 17:15
Readability: Python List Comprehension vs. Not
# List Comprehension:
process_dict = dict([(attributes.filename, attributes.st_size) for attributes in file_list if attributes.filename.startswith('solcon')])
# Whitespace Generous:
for attributes in file_list:
if attributes.filename.startswith('solcon'):
process_dict[attributes.filename] = attributes.st_size
@vitalibertas
vitalibertas / python_venv.md
Last active November 4, 2019 17:57
Python3 Virtualenv Setup

Python3 Virtualenv Setup

Requirements
  • Python 3
  • Pip 3
$ brew install python3
@vitalibertas
vitalibertas / getApiResults.py
Last active March 11, 2020 16:23
Python API Download Zipped JSON file, Unzip and Format for Redshift, Upload to S3 as GZip.
gz_buffer = BytesIO()
json_buffer = StringIO()
download_url = "{0}{1}/file".format(request_url, file_id)
request_download = requests.request("GET", download_url, headers=json_header, stream=True)
with zipfile.ZipFile(BytesIO(request_download.content), mode='r') as z:
unzip_file = StringIO(z.read(z.infolist()[0]).decode('utf-8'))
json_responses = json.load(unzip_file)['responses']
for response in json_responses:
json_buffer.write(json.dumps(response))
# http://docs.wand-py.org/en/0.5.9/
# http://www.imagemagick.org/script/formats.php
# brew install freetype imagemagick
# brew install PIL
# brew install tesseract
# pip3 install wand
# pip3 install pyocr
import pyocr.builders
import requests
from io import BytesIO