Skip to content

Instantly share code, notes, and snippets.

@calonso
calonso / getLiveSchemas.scala
Created April 26, 2018 19:05
Apache Beam function to get an updating side input
def updatingSchemas(sc: ScioContext, refreshFreq: Duration, tableNames: List[String]) {
sc.customInput("Tick", GenerateSequence.from(0).withRate(1, refreshFreq))
.withName("Retrieve schemas")
.flatMap { _ =>
tableNames.map(t => (t, bq.getTableSchema(t)))
}
.withName("Set windowing")
.withFixedWindows(refreshFreq)
.withName("To side input")
.asMapSideInput

The old days

  • An hourly rotated log system
  • Issues:
    • EOF generation and propagation
    • Manual intervention required to continue after errors

First streaming architecture

https://labs.spotify.com/2016/02/25/spotifys-event-delivery-the-road-to-the-cloud-part-i/

  • Key requirement: To deliver complete data with a predictable latency and make it available to our developers via well-defined interface.
  • Event (structured data) as unit of streamed information.

Keybase proof

I hereby claim:

  • I am calonso on github.
  • I am calonso (https://keybase.io/calonso) on keybase.
  • I have a public key whose fingerprint is 410F FC29 4601 E5AC 6D23 F2ED 0EFE 63CF E7FC 403F

To claim this, I am signing this object:

@calonso
calonso / gist:17b5dcc48071a55538c9
Created May 20, 2015 10:26
Cassandra Summit CFP Submission
Presentation Title: Case Study: Troubleshooting Production Issues as a Developer.
Presentation Abstract: Step by step walkthrough of a developer troubleshooting a real performance issue we had at MyDrive. From the very first steps diagnosing the symptoms, through looking at metric charts down to CQL queries, the Ruby CQL driver, and Ruby code profiling.
@calonso
calonso / bins_as_cols_cassandra_model_benchmark.rb
Last active August 29, 2015 14:21
Ruby script to benchmark a candidate Cassandra model
#!/usr/bin/env ruby
require 'rubygems'
require 'bundler'
Bundler.setup
Bundler.require
require 'yaml'
require 'csv'
require 'logger'
@calonso
calonso / bins_as_list_cassandra_model_benchmark.rb
Last active August 29, 2015 14:21
Ruby script to benchmark a candidate Cassandra model
#!/usr/bin/env ruby
require 'rubygems'
require 'bundler'
Bundler.setup
Bundler.require
require 'yaml'
require 'csv'
require 'logger'
@calonso
calonso / cassandra-insert-profiling.rb
Last active August 29, 2015 14:20
Ruby program to profile an insert in Cassandra
#!/usr/bin/env bundle exec ruby
Bundler.setup
Bundler.require
require 'yaml'
def config
YAML.load_file File.expand_path('../../cassandra.yml', __FILE__)
end
@calonso
calonso / gist:ea8323e953648fed0541
Created April 1, 2015 16:08
Distributed C* stress 3vs3 prepared
--------------------------------------------------
"First client's output"
--------------------------------------------------
total,interval_op_rate,interval_key_rate,latency,95th,99.9th,elapsed_time
48524,4852,4852,5.7,20.1,198.2,10
96718,4819,4819,6.8,23.2,204.0,20
140758,4404,4404,7.5,23.6,204.0,30
183611,4285,4285,7.3,25.2,230.9,40
236244,5263,5263,7.2,25.8,292.4,50
287249,5100,5100,6.8,24.0,229.7,60
@calonso
calonso / gist:e23dfadda3e9a5cca016
Created April 1, 2015 16:02
Distributed C* stress 3vs3 non prepared
--------------------------------------------------
"First client's output"
--------------------------------------------------
total,interval_op_rate,interval_key_rate,latency,95th,99.9th,elapsed_time
45565,4556,4556,6.2,16.2,125.3,10
96277,5071,5071,7.0,19.2,187.9,20
153274,5699,5699,6.9,18.3,187.9,30
208380,5510,5510,6.8,17.9,270.8,40
258342,4996,4996,7.0,19.2,270.8,50
310647,5230,5230,7.0,20.0,270.8,60
@calonso
calonso / gist:43ea2c5e7a61f831a86c
Last active August 29, 2015 14:17
Custom C* configs comparison
This ruby benchmark uses the latest cassandra-driver gem and inserts 50.000 entries in the cluster using several methods.
The model is like this:
CREATE TABLE events (
id varchar,
timestamp timestamp,
var1 float,
var2 float,
var3 varchar,
var4 int,
PRIMARY KEY (id, timestamp)