Skip to content

Instantly share code, notes, and snippets.

View iconara's full-sized avatar
🤖

Theo iconara

🤖
View GitHub Profile
@iconara
iconara / run-query.sh
Created November 10, 2020 13:37
Run Athena queries with aws-cli
#!/usr/bin/env bash
region=us-east-1
query='SELECT NOW()'
output_location="s3://aws-athena-query-results-1234567890-$region/"
query_execution_id=$(aws athena start-query-execution \
--region "$region" \
--query-string "$query" \
--result-configuration "OutputLocation=$output_location" \
@iconara
iconara / InputStreamResponseTransformer.java
Last active November 5, 2020 16:51
S3 GetObject InputStreamResponseTransformer using AWS SDK for Java v2
// this is an attempt to create a synchronous InputStream from a call to
// S3AsyncClient#getObject using a blocking queue.
//
// the purpose is to be able to make many S3 operations asynchronously, but
// at the same time be able to pass off some results to threads and into
// code that expects InputStream or Reader, like a Commons CSV.
public class InputStreamResponseTransformer extends InputStream implements AsyncResponseTransformer<GetObjectResponse, InputStream>, Subscriber<ByteBuffer> {
private static final ByteBuffer END_MARKER = ByteBuffer.allocate(0);
@iconara
iconara / athena-metadata-parser.rb
Last active September 4, 2023 13:39
Athena metadata file parser
#!/usr/bin/env ruby
# This code parses the .csv.metadata files written by Athena and produces a
# structure similar to what you get from the GetQueryResults API call.
#
# I have reverse engineered the format and I'm not sure about all the details,
# but it seems to correspond to the GetQueryResults API call well. Some things,
# like nullability, the difference between name and label, and the schema_name
# and table_name fields, I haven't been able to figure out because they seem
# not to be used, or never takes any other values.
@iconara
iconara / pg.sql
Last active October 1, 2018 14:48
Useful PostgreSQL queries
-- Connection limits by role
SELECT rolname, rolconnlimit
FROM pg_roles
WHERE rolconnlimit <> -1;
-- Change connection limit for a role
ALTER USER $role WITH CONNECTION LIMIT 64;
-- Current activity
SELECT *
@iconara
iconara / validate-table.rb
Last active June 14, 2018 13:23
Quick and dirty script to find spurious files in the prefix of a Glue table
require 'aws-sdk-glue'
require 'aws-sdk-s3'
def split_s3_uri(s3_uri)
s3_uri.match(%r{\As3://(.+?)/(.+)\z}).to_a.drop(1)
end
database, table_name = ARGV.take(2)
glue = Aws::Glue::Client.new
@iconara
iconara / create-external-schema.sql
Last active September 7, 2017 14:57
Redshift Spectrum cheat sheet
-- this creates a schema called "name_of_schema_in_redshift" in Redshift,
-- that works as an alias for the Athena/Glue database "name_of_database_in_glue".
CREATE EXTERNAL SCHEMA name_of_schema_in_redshift
FROM DATA CATALOG
DATABASE 'name_of_database_in_glue'
REGION 'us-east-1'
IAM_ROLE 'arn:aws:iam::456064453472:role/xyz';
@iconara
iconara / dot-generator.rb
Last active September 14, 2017 07:13
Visualize EC2 security group dependencies
require 'aws-sdk-ec2'
ec2 = Aws::EC2::Client.new
response = ec2.describe_security_groups
puts('digraph securitygroups {')
loop do
response.security_groups.each do |security_group|
@iconara
iconara / auto-add-partitions.sql
Created September 6, 2017 13:59
Athena cheat sheet
-- Discovers all partitions of a table if they use Hive's partitioning format (e.g. partition0=abc/partition1=def)
MSCK REPAIR TABLE tablename;
@iconara
iconara / cbck.sh
Last active December 6, 2016 09:49
Check Cassandra backup integrity
#!/bin/bash
function log() {
logger -st "cbck[$$]" "$@"
}
function check_failed() {
log -p user.err "Check failed: $1"
exit 1
}
@iconara
iconara / jruby-openssl-110.rb
Last active October 26, 2016 08:35
Warnings from jruby-openssl (jruby/jruby-openssl#110)
# Running this script with `ruby -w` will print these warnings:
# .../lib/ruby/1.9/webrick/https.rb:26 warning: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
# .../lib/ruby/1.9/webrick/https.rb:27 warning: javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated
require 'webrick'
require 'webrick/https'
require 'net/https'
require 'logger'
def create_root_ca(cn)