jstaffans/distributed_matters_2015.markdown

## distributed_matters_2015.markdown

      
    Raw
  

              distributed_matters_2015.markdown
            
          
    Notes from the distributed matters 2015 conference.
Jepsen V (Kyle Kingsbury)

Talks about how Jepsen works and what systems have been tested until now, among others:

Riak: default is last-write-wins, but you should not be using it
ZK: works as advertised (!)

... But all of this is also on his blog, not anything really new.
Do test your failure modes!

Terminate AWS nodes
faketime: lie about time to a process
network partitions: iptables -j DROP

Lips in the Machine (Joe Nash)

Or: A tale of queues — from ActiveMQ over Hazelcast to Disque

Experience report from Braintree

same token-based payment interface to PayPal, credit cards etc
mostly Ruby shop
Clojure


Primary DB + Data Warehouse for backup/replication/SSOT

Primary DB not slowed down by transactions
Primary DB sharded by merchant
Amazon Redshift for DW

DW needs to exactly replicate data structure of Postgres DB


Previously: batch processing, transaction searches in primary DB

slow, unpredictable
burden of knowledge: users of DW need to be aware of limitations


Options discussed for DW sync:

Postgres replication -> DW: now available in Postgres but not back then
PGQ: potential data loss
Kafka: PubSub msg system at massive scale

Replays, strongly ordered ..


Ended up using Kafka with PGQ for getting events from Postgres and putting them in Kafka
Elasticsearch for transaction searches

Needs to know domain-specific events ("what happened")


Clojure: using many JVM technologies. Best-maintained client libs are JVM libs. Data structures are a win!

Infinite lazy streams:

(try 
  (doseq [msg (get-message ..)] 
    (process-msg msg)
  (finally (shutdown))))

Issue with testing: pass in stream to function instead, so you can pass a finite stream for testing.
Pulsar: handle shutdown, actor model fits well with Elasticsearch
G1GC: works well with small heaps. With Clojure ie large amounts of objects, you want small heaps (why though?)

Win: real-time data available in DW for real-time reports and fraud analysis.
Takeaway: big win for analytics to have real-time DW.
A tale of queues — from ActiveMQ over Hazelcast to Disque (Philipp Krenn)


Business domain: ordering, working with lots of legacy systems. Customers pay per message.
Messages initially land in a MySQL database.
Apache Camel for routing messages to connectors. What is Camel?
Queues:

The case against queues
Blocking consumers: antipattern b/c you seem to want synchronicity
Systems complexity


Order: http://book.mixu.net/distsys/time.html. In many cases, ordering is not important.
Exactly once: conflicts with removal/ack. Job processor may crash.
At least once: easier to distribute than at-most-once b/c no communication overhead.

Idempotent consumption (ie set to 12 instead of add 3).


Goal is exactly-once.

SQS: At-least-once, small-ish payload (256 kB).
ActiveMQ with MySQL (RDS) storage.

Tuned JDBC connection params to allow around one hour of RDS downtime.
Hard to test RDS downtime, instance size can be changed to trigger failover.
RDS is the single point of failure.


HazelCast

"Eierlegende Wollmilchsau"
Seems nobody is using the queuing feature though, undocumented
Very tightly coupled with your application


Kafka

Not interested in Kafka's real-time focus
At-least-once


RabbitMQ

too much middleware, messages can be duplicated - not acceptable for an ordering system


Disque (pronounced dis-queue)

Exactly what they wanted: great doc, does-one-thing, multi-master HA
configurable ACK: at-least-once or at-most-once
similar to Redis
More details at Salvatore's talk
Currently being used in billing system, which has lots of individual pricing schemes.


QA


Why use a queue at all with so small amount of messages? Could use MySQL.

aim is to grow
Camel wants a queue
avoids large amounts of inserts


Distributed transactions?

Consensus is: avoid
Can re-read message from source database if something goes wrong.


Takeaway: should think about ACK semantics carefully. Disque seems like the new hotness.
Microservices at SoundCloud: Phil Calcado


SoundCloud is the biggest audio platform on the web.

How we ended up with microservices


Microservice prerequisites

rapid provisioning
basic monitoring (netflix-level monitoring not needed)
rapid app development


12 factor app was a good guideline in the beginning
Slow to roll out microservice platform, industry moved on
Provisioning, deployment: first iteration was home-grown, now moving to Docker + Chef + kubernetes. Inspired by ex-Google engineers.
Telemetry: was basic StatsD + Graphite + Nagios, now Prometheus (SoundCloud 20% time project) + Icinga

standardised dashboards: easier to compare services


Standardised ops: see the twitter-server template. Each server has an /admin endpoint with common tasks like restart, ping etc.
Deployment: Each team has its own Jenkins instance with elastic slaves. Jenkins builds container/package, shoves to kubernetes.

containers enable mini-SoundClouds for development

launch one container that acts as DNS
Containers register themselves with this DNS (wrapper around docker run?)
dev container launched using the DNS container to find services


Best things: Prometheus, 12 Factor App, Finagle
QA


Shared codebases?

Avoided, but shared libraries are used a lot
Ownership important
Shared codebases is an interesting topic! Google et al are doing it.


Takeaway: ultimate bikeshedding weapon in ops discussion is twitter-server, standardise admin access across services.
SimCity BuildIt – Building Highly Scalable and Cost Efficient Server Architecture (Matti Palosuo)


8M transactions per day
Redis as main DB, sharded per user

all players from past two months
win to use Redis as main db, not only key-value storage in the sidelines!


Anteater - own Java tool atop of Redis

transactions
distributed config
sharding, failover


MongoDB is archive db - archive daemon moves data from Redis

sharded like redis
all-time players
backups, recovery


High availability: ELB + (HAProxy, Tomcat, Redis, MongoDB) x 3 AZs
Optimizations

Patched Redis for multiple actions, optimistic locking, transactions across shards
Protobuf, private / public split. Normally only public data is restored from MongoDB. If player himself logs in again, private data is restored as well.
Traffic: batched communication client-server, minimize cross-AZ traffic

Netflix OSS and Spring Cloud


Spring Cloud

not Mesos competitor, not just for cloud systems
valid for traditional apps too
framework for distributed app development
part of your app, along with e.g. Netflix OSS -> containerized


Netflix stack - integrated into Spring Cloud:

Eureka Service Registry
Hystrix (circuit breaker)
Ribbon (client load balancer - no extra component for LB)
(Zuul [proxy])


Spring cloud has annotations for e.g. registering service in Eureka.


Frontend app can do HTTP requests and hostname gets automatically resolved by Eureka (Ribbon)


Eureka URL only URL needed in app config


Interesting demo: launched several servers listening on different ports from inside IDEA.

Hystrix

also annotation-based in Spring Cloud
happens before load balancing
nice dashboard, integrated or separate


Takeaways:

Using Netflix OSS is really easy with Spring Cloud
Netflix OSS: battle-tested, mature libraries

Microservices (Uwe Friedrichsen)


When do microservices make sense?

Need for development speed
other reasons: devops, polyglot, scalability ..


Consequences of microservices

Design, implementation more challenging
Lookup, liveliness, partitioning, latency, consistency ..


Are we doing microservices just to achieve the encapsulation that programming languages already gives us?
Path to microservices:

Master modularization first

otherwise you'll end up with tightly coupled services
Bounded Contexts as jump-off point for modularization


Forget about layers
Re-think DRY - avoid deployment dependencies


Interfaces:

Should be able to update services independently - plan for interface evolution
Postel's Law: "Be liberal in the tnings you accept, be strict in the tnings you offer"

validate what you get from other services, don't blindly accept responses


Can wrap a bounded context (cluster of services) with an API gateway
Synch vs asynch (request/response vs event-driven):

synch not necessarily simpler, b/c timeout management
ends up with wildly different designs


Datastores:

avoid single, big DB
avoid distributed transactions - re-think design if you find you need them
relax temporal constraints, aim for eventual consistency - make actions idempotent
...


Production readiness:

Need to solve: configuration, orchestration, discovery, routing, observability, resilience
Error handling: throwing an exception doesn't work anymore. Separate error (escalation) and control flows.


Wrap-up: focus on functional design and production readiness!
Disque


Redis roots: in-memory, optional persistence, same protocol, same license
Developed b/c Redis was used as queue, which is was never designed for
Use cases

async job execution
microservice bus
distributed scheduler


API - same structure as Redis

ADDJOB queue job <timeout> ..
GETJOB q1 q2 ...
ACKJOB id1 id2 ...


At least once (default)

Liveness: eventually msg will be delivered
Safety: msg not yet delivered at least once will never be evicted

but TTL


Tries hard to be exactly once


At most once

No liveness guarantees
Safety: dequeued msg will never be queued again


TTL: some messages make no sense to deliver after some period of time. Can also be used for high-throughput events with a short TTL (drop events if not handled)
Supports delays
Replication (synchronous): default cluster factor three
ASYNC: asynchronous replication
Persistent (optionally): append-only file. When restarted, can load state
CAP
Main sacrifice: best effort ordering

Disque can break ordering by re-delivering msg


Bad messages that cause workers to crash

NACK: alternative to a dead letter queue
GETJOB exposes two counters - worker can decide if a msg is kosher or should be dropped based on earlier events


Talk about internals, how cluster decides who delivers messages.

Clients that are smart can check if connected node is one that it has not seen many messages delivered from and move to another node that is "closer" in the topology
Demo: consumers adjust automatically to speed of producer