Skip to content

Instantly share code, notes, and snippets.

vrivellino / ProcessKinesisRecords.js
Last active December 30, 2017 15:35
Lambda: archive kinesis stream to S3
View ProcessKinesisRecords.js
console.log('Loading function');
var AWS = require('aws-sdk'),
s3 = new AWS.S3(),
s3Bucket = 'archive-bucket',
s3Prefix = 'kinesis-archive-test',
s3Partitions = 2;
exports.handler = function (event, context) {
//console.log(JSON.stringify(event, null, 2));
epiphani /
Last active July 5, 2018 14:13
Getting Tez enabled on CDH5.4+

So Hive in CDH is horribly, painfully slow. Cloudera ships Hive 1.1, which is actually moderately modern. It is, however, very badly configured out of the box and patched with custom code from Cloudera. With a bit of effort, we managed to improve hive performance considerably. We really shouldn't have to do this, but Cloudera is actively working against supporting a performant Hive.

First, building Tez was fairly straightforward. Using the instructions at, the only change was to use the version string "2.6.0" for the build. I believe that was the default. Don't use the CDH string, it won't work.

At the bottom of the installation instructions, there's mention of the fact that to use the local hadoop jars (rather than those packaged with tez) you must unpack the jars in HDFS rather than using the tarball. In this case, unpack the tez-minimal tarball and upload the contents to /apps/tez-0.7.0 (or whatever you prefer). Don't fo

eridal / dp2dot.js
Created July 28, 2017 22:24
Build a dot model from a aws data pipeline json definition
View dp2dot.js
function merge (into, object) {
.forEach(k => {
let val = object[k]
if (val){
if (Array.isArray(val)) {
into[k] = [].concat(into[k] || [], val)
else if (typeof val === 'object') {
9b /
Created May 19, 2013 20:01
Uses the Google Drive API to upload a file, convert it to a file format, download it locally and delete it from Drive.
def poorMansConvert(di, inPath, outType, outPath):
from apiclient.http import MediaFileUpload
valid_output = [
ambakshi /
Last active October 25, 2021 15:50
Assume an IAM role. An interesting way of doing IAM roles is to give the instance permissions to assume another role, but no actual permissions by default. I got this idea while setting up security monkey:
# Assume the given role, and print out a set of environment variables
# for use with aws cli.
# To use:
# $ eval $(./
linar-jether /
Last active May 22, 2022 12:26
Grafana python datasource - using pandas for timeseries and table data. inspired by and compatible with the simple json datasource ---- Up-to-date version maintained @
from flask import Flask, request, jsonify, json, abort
from flask_cors import CORS, cross_origin
import pandas as pd
app = Flask(__name__)
cors = CORS(app)
app.config['CORS_HEADERS'] = 'Content-Type'
provegard /
Created December 5, 2011 21:52
Small SSDP server/client test in Python
# Python program that can send out M-SEARCH messages using SSDP (in server
# mode), or listen for SSDP messages (in client mode).
import sys
from twisted.internet import reactor, task
from twisted.internet.protocol import DatagramProtocol
alfredkrohmer /
Created November 23, 2016 21:52
XBox One Wireless Controller Protocol

Physical layer

The dongle itself is sending out data using 802.11a (5 GHz WiFi) with OFDM and 6 Mbit/s data rate:

Radiotap Header v0, Length 38
    Header revision: 0
    Header pad: 0
    Header length: 38
    Present flags