Skip to content

Instantly share code, notes, and snippets.

View turtlemonvh's full-sized avatar

Timothy turtlemonvh

View GitHub Profile
@turtlemonvh
turtlemonvh / stitch_transcript.py
Created May 10, 2019 18:52
Stitch together AWS Transcribe transcripts, including speaker labels
#!/usr/bin/env python
"""
Stitch multiple files worth of AWS transcripts together.
Does not attempt to match speakers across filesm but does label all speaker changes.
Usage:
python stitch_transcript.py *.mp3.json -o out.txt
See blog post: http://turtlemonvh.github.io/aws-transcribe-for-long-zoom-meetings.html
@turtlemonvh
turtlemonvh / aws_prices.md
Last active August 2, 2019 02:33
AWS compute price analysis

AWS provides a lot of different options for running compute.

In the context of a data pipeline, I was wondering how these stacked up, esp. in terms of price.

Options

Glue

Links

Keybase proof

I hereby claim:

  • I am turtlemonvh on github.
  • I am turtlemonvh (https://keybase.io/turtlemonvh) on keybase.
  • I have a public key ASAXQE1mPcXc-76QugSJ_tYCaGLGKdoJsahh_dVztN57Cgo

To claim this, I am signing this object:

@turtlemonvh
turtlemonvh / README.md
Created March 7, 2019 20:01
Different `cp -r` behavior on Linux and Mac

Different cp -r behavior on Linux and Mac

When running the sript above, we see different behavior on Mac and Linux.

On Linux, dir a is copied into dir b

$ bash dircptest.sh
total 0
drwxrwxr-x. 2 vagrant vagrant 21 Mar 7 19:46 a
@turtlemonvh
turtlemonvh / s3_nested_data_counts.py
Created January 22, 2019 22:28
Counts for nested data in AWS S3
import boto3
from collections import Counter
"""
If your data uses "/" in a directory-like structure and you want to expand the list of items.
Similar to `tree -L2 prefix/` in *nix.
"""
s3 = boto3.client('s3')
bucket_name = "XXX" # s3 bucket name
@turtlemonvh
turtlemonvh / ionic_secrets_storage.py
Last active February 13, 2019 19:59
Store secrets as protected attributes on Ionic keys
import ionicsdk
import subprocess
import os
"""
Use Ionic to store secrets (e.g. application credentils).
Shows how to merge keys containing different sets of secrets, so an application can be granted access to different sets of secrets, managed by different access policies.
Note that if multiple keys are created with the same external id, the newest will be fetched, which makes secret rotation easier.
Inspired by AWS ParamStore:
@turtlemonvh
turtlemonvh / README.md
Created September 19, 2018 12:07
Selenium survey voting

Selenium tests

Experiments with selenium to vote on a SurveyMonkey survey on OSX. Uses PhantomJS. Headless chrome may be a better option now.

You should have

  • java installed (can download dmg from oracle)
  • phantomjs installed: http://phantomjs.org/download.html
    • it needs to be on your path
  • assuming you downloaded and unzipped version 2.1.1 into the current directory, you can use . setpath.sh to fix your path
@turtlemonvh
turtlemonvh / CartesianProduct.scala
Last active March 5, 2020 10:27
Scala cartesian product
// Based on: http://thushw.blogspot.com/2015/10/cartesian-product-in-scala.html
package com.github.turtlemonvh.helpers
object SequenceHelpers {
/* Take a list of lists and return a list of lists that is the cartesian product of the members if each list.
val seqs = List(List("1", "2", "3"), List("a", "b", "c", "d"), List("true", "false"))
// 24 = (3 * 4 * 2)
cartesianProduct[String](seqs).length

CloudTrail log search

Download logs from s3 and search through them. Caches downloaded files at _search_downloads/ for better performance. Outputs json. Use jq for further processing and filtering. (example: https://gist.github.com/pcn/f98c7852b0558b847784)

@turtlemonvh
turtlemonvh / RDS_markup.md
Created February 3, 2018 17:54
Compare the hourly price for RDS vs EC2 instances

RDS Pricing Markup

Looking at what the markup is on per-hour price of on-demand RDS instances vs on-demand EC2 instances.

See the python code for the analysis.

In the columns below

  • ec2 = base EC2 instances from us-east-1
  • p = RDS Postgres flavor