Skip to content

Instantly share code, notes, and snippets.

View yammik's full-sized avatar
🕳️

May Kim yammik

🕳️
View GitHub Profile
@yammik
yammik / multipageTiffSplitter.py
Created July 7, 2018 01:15
script to split multipage TIFF for TFM
import os
import subprocess
path = raw_input("path : ") + "/"
for filename in os.listdir(path):
if (filename.endswith('.tif')):
print(filename)
cmd = "/usr/local/bin/convert -quiet -scene 1 " + path + filename + " " + path + filename + "-%d.tif"
subprocess.call(cmd, shell=True)
@yammik
yammik / perms.js
Created January 28, 2019 22:30
Creating permutations of a string
function perm(str) {
const results = [];
if (str.length === 1) return str;
for (let i = 0; i < str.length; i++) {
const currentChar = str[i];
const remainingChars = str.slice(0, i) + str.slice(i+1);
const innerPermutations = perm(remainingChars);
for (let j = 0; j < innerPermutations.length; j++) {
results.push(currentChar + innerPermutations[j]);
@yammik
yammik / findSubStrPerm.js
Created January 28, 2019 22:30
How to find permutations of substr in a string (problem from CtCI)
function excise(str, index) { // helper function to remove char from substr
return str.slice(0, index) + str.slice(index+1);
}
function compare(template, x) { // compares two str of equal length
if (template.length !== x.length) return false;
if (template === x) {
return true;
} else if (template.length === 1) {
@yammik
yammik / petango.css
Created September 28, 2019 13:43
i hate petango
div {
border: 2px solid red;
}

Chosen approach

AWS Glue

AWS Glue is a serverless ETL service for data analysis:

With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

Since our nonpartitioned data are already in S3, we can set up Glue to read directly from the bucket with a predefined schema.

We can use AWS Glue to repartition: