Skip to content

Instantly share code, notes, and snippets.

View kerinin's full-sized avatar

Ryan Michael kerinin

View GitHub Profile
### Keybase proof
I hereby claim:
* I am kerinin on github.
* I am kerinin (https://keybase.io/kerinin) on keybase.
* I have a public key ASCG5umtk3kRCyOUp2VwMM_doaZca1MkGktyOJrhm5Xhfwo
To claim this, I am signing this object:
// Source data, generated by some preprocessing pipeline or read from Kafka
var examples: DataStream[(Input,Target)] = null
// Algorithms to train against. An algorithm defines most of the values passed to SageMaker
// when creating training jobs and models. Algorithms can be defined statically or read from
// a live stream (ie Kafka). Algorithms have an associate "id" that can be used to train multiple
// algorithms against a single dataset.
var algorithms: DataStream[AlgorithmEvent] = null
// The `split_examples` uses a `Splitter` to partition an input stream into Training & Test datasets.
object Main {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
clicks = env.readFile("myclicks.txt")
result = clicks
.map(new ClickTransformer)
.keyBy(0)
.window(EventTimeSessionWindows.withGap(Time.minutes(30L)))
var parsedEvent = JsonParser.Default.Parse<ParsedDataEventWebhook>(json);
//verify the signature to validate the webhook came from RP
var key = "CIO_SECRET"; //this should be your CIO auth secret
var encoding = BinaryStringEncoding.Utf8;
var algorithmProvider = MacAlgorithmProvider.OpenAlgorithm("HMAC_SHA256");
var contentBuffer =
CryptographicBuffer.ConvertStringToBinary(parsedEvent.checksum,
encoding);
var keyBuffer = CryptographicBuffer.ConvertStringToBinary(key, encoding);
// Code generated by protoc-gen-go.
// source: src/demo/demo.proto
// DO NOT EDIT!
/*
Package demo is a generated protocol buffer package.
It is generated from these files:
src/demo/demo.proto
package main
import (
"crypto/tls"
"crypto/x509"
"flag"
"fmt"
"log"
"time"
@kerinin
kerinin / Accounts_Kinesis.md
Created March 22, 2016 20:28
Accounts Kinesis

Accounts Kinesis

We currently need to solve a couple of problems in the mailservice:

  1. We need to react to account deletions (ie terminating ongoing processes, and preventing retries of failed processes)
  2. We need to increase processing capacity as the number of accounts being handled increases
  3. To reduce the probability of k8s scheduling conflicts, we need to minimize the resources requested by mailservice pods

The current approach to event notification is to write SQS events to signal state changes, for instance when a new account is created, the process creating the account writes an event to one of 64 "created" queues, which is consumed by the sync process. We could create a partitioned set of "deleted" queues and write deletion events into them. This approach feels brittle, as it relies on SQS events being created anywhere accounts are modified, and we're already doing this in at least two places (the API and the CIO kafka topic watcher).

Ryan Michael

My work emphasizes horizontal scalability, simple well-defined organizational boundaries and pervasive introspection through logging, metrics & alerting. My tools of choice are usually streams of immutable data, distributed data stores, canonical & unambiguous interface description languages and containerized runtimes

tl;dr

  • Enjoys: building performant, scalable infrastructure focused on robustness and maintainability
  • Speaks: Go, Ruby, Rust, Clojure, Python
  • Uses: Kafka, DynamoDB, Hadoop, Storm, Consul/Serf
package event
// Events returns the set of events describing the change from prev -> next
//
// This is a fairly complex operation, and needs to handle the following
// edge cases:
// * A message could be copied from one folder to another, which we can only
// determine based on message ID (which isn't present in the snapshots)
// * A message coule be moved from one folder to another, which also relies on
// message ID

Ryan Michael

I'm looking for a new set of challenges to take on. I want to address interesting problems and produce robust, elegant, composable solutions.

I'm interested in distributed systems, machine learning, and how to organize large volumes of data.

tl;dr