Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
distributed matters 2015

Notes from the distributed matters 2015 conference.

Jepsen V (Kyle Kingsbury)

Talks about how Jepsen works and what systems have been tested until now, among others:

  • Riak: default is last-write-wins, but you should not be using it
  • ZK: works as advertised (!)

... But all of this is also on his blog, not anything really new.

Do test your failure modes!

  • Terminate AWS nodes
  • faketime: lie about time to a process
  • network partitions: iptables -j DROP

Lips in the Machine (Joe Nash)

Or: A tale of queues — from ActiveMQ over Hazelcast to Disque

  • Experience report from Braintree
    • same token-based payment interface to PayPal, credit cards etc
    • mostly Ruby shop
    • Clojure
  • Primary DB + Data Warehouse for backup/replication/SSOT
    • Primary DB not slowed down by transactions
    • Primary DB sharded by merchant
    • Amazon Redshift for DW
      • DW needs to exactly replicate data structure of Postgres DB
  • Previously: batch processing, transaction searches in primary DB
    • slow, unpredictable
    • burden of knowledge: users of DW need to be aware of limitations
  • Options discussed for DW sync:
    • Postgres replication -> DW: now available in Postgres but not back then
    • PGQ: potential data loss
    • Kafka: PubSub msg system at massive scale
      • Replays, strongly ordered ..
  • Ended up using Kafka with PGQ for getting events from Postgres and putting them in Kafka
  • Elasticsearch for transaction searches
    • Needs to know domain-specific events ("what happened")

Clojure: using many JVM technologies. Best-maintained client libs are JVM libs. Data structures are a win!

  • Infinite lazy streams:
(try 
  (doseq [msg (get-message ..)] 
    (process-msg msg)
  (finally (shutdown))))
  • Issue with testing: pass in stream to function instead, so you can pass a finite stream for testing.
  • Pulsar: handle shutdown, actor model fits well with Elasticsearch
  • G1GC: works well with small heaps. With Clojure ie large amounts of objects, you want small heaps (why though?)

Win: real-time data available in DW for real-time reports and fraud analysis.

Takeaway: big win for analytics to have real-time DW.

A tale of queues — from ActiveMQ over Hazelcast to Disque (Philipp Krenn)

  • Business domain: ordering, working with lots of legacy systems. Customers pay per message.
  • Messages initially land in a MySQL database.
  • Apache Camel for routing messages to connectors. What is Camel?
  • Queues:
  • Order: http://book.mixu.net/distsys/time.html. In many cases, ordering is not important.
  • Exactly once: conflicts with removal/ack. Job processor may crash.
  • At least once: easier to distribute than at-most-once b/c no communication overhead.
    • Idempotent consumption (ie set to 12 instead of add 3).

Goal is exactly-once.

  • SQS: At-least-once, small-ish payload (256 kB).
  • ActiveMQ with MySQL (RDS) storage.
    • Tuned JDBC connection params to allow around one hour of RDS downtime.
    • Hard to test RDS downtime, instance size can be changed to trigger failover.
    • RDS is the single point of failure.
  • HazelCast
    • "Eierlegende Wollmilchsau"
    • Seems nobody is using the queuing feature though, undocumented
    • Very tightly coupled with your application
  • Kafka
    • Not interested in Kafka's real-time focus
    • At-least-once
  • RabbitMQ
    • too much middleware, messages can be duplicated - not acceptable for an ordering system
  • Disque (pronounced dis-queue)
    • Exactly what they wanted: great doc, does-one-thing, multi-master HA
    • configurable ACK: at-least-once or at-most-once
    • similar to Redis
    • More details at Salvatore's talk
    • Currently being used in billing system, which has lots of individual pricing schemes.

QA

  • Why use a queue at all with so small amount of messages? Could use MySQL.
    • aim is to grow
    • Camel wants a queue
    • avoids large amounts of inserts
  • Distributed transactions?
    • Consensus is: avoid
    • Can re-read message from source database if something goes wrong.

Takeaway: should think about ACK semantics carefully. Disque seems like the new hotness.

Microservices at SoundCloud: Phil Calcado

  • SoundCloud is the biggest audio platform on the web.
  • Microservice prerequisites
    • rapid provisioning
    • basic monitoring (netflix-level monitoring not needed)
    • rapid app development
  • 12 factor app was a good guideline in the beginning
  • Slow to roll out microservice platform, industry moved on
  • Provisioning, deployment: first iteration was home-grown, now moving to Docker + Chef + kubernetes. Inspired by ex-Google engineers.
  • Telemetry: was basic StatsD + Graphite + Nagios, now Prometheus (SoundCloud 20% time project) + Icinga
    • standardised dashboards: easier to compare services
  • Standardised ops: see the twitter-server template. Each server has an /admin endpoint with common tasks like restart, ping etc.
  • Deployment: Each team has its own Jenkins instance with elastic slaves. Jenkins builds container/package, shoves to kubernetes.
    • containers enable mini-SoundClouds for development
      • launch one container that acts as DNS
      • Containers register themselves with this DNS (wrapper around docker run?)
      • dev container launched using the DNS container to find services

Best things: Prometheus, 12 Factor App, Finagle

QA

  • Shared codebases?
    • Avoided, but shared libraries are used a lot
    • Ownership important
    • Shared codebases is an interesting topic! Google et al are doing it.

Takeaway: ultimate bikeshedding weapon in ops discussion is twitter-server, standardise admin access across services.

SimCity BuildIt – Building Highly Scalable and Cost Efficient Server Architecture (Matti Palosuo)

  • 8M transactions per day
  • Redis as main DB, sharded per user
    • all players from past two months
    • win to use Redis as main db, not only key-value storage in the sidelines!
  • Anteater - own Java tool atop of Redis
    • transactions
    • distributed config
    • sharding, failover
  • MongoDB is archive db - archive daemon moves data from Redis
    • sharded like redis
    • all-time players
    • backups, recovery

High availability: ELB + (HAProxy, Tomcat, Redis, MongoDB) x 3 AZs

Optimizations

  • Patched Redis for multiple actions, optimistic locking, transactions across shards
  • Protobuf, private / public split. Normally only public data is restored from MongoDB. If player himself logs in again, private data is restored as well.
  • Traffic: batched communication client-server, minimize cross-AZ traffic

Netflix OSS and Spring Cloud

  • Spring Cloud

    • not Mesos competitor, not just for cloud systems
    • valid for traditional apps too
    • framework for distributed app development
    • part of your app, along with e.g. Netflix OSS -> containerized
  • Netflix stack - integrated into Spring Cloud:

    • Eureka Service Registry
    • Hystrix (circuit breaker)
    • Ribbon (client load balancer - no extra component for LB)
    • (Zuul [proxy])
  • Spring cloud has annotations for e.g. registering service in Eureka.

  • Frontend app can do HTTP requests and hostname gets automatically resolved by Eureka (Ribbon)

  • Eureka URL only URL needed in app config

Interesting demo: launched several servers listening on different ports from inside IDEA.

  • Hystrix
    • also annotation-based in Spring Cloud
    • happens before load balancing
    • nice dashboard, integrated or separate

Takeaways:

  • Using Netflix OSS is really easy with Spring Cloud
  • Netflix OSS: battle-tested, mature libraries

Microservices (Uwe Friedrichsen)

  • When do microservices make sense?
    • Need for development speed
    • other reasons: devops, polyglot, scalability ..
  • Consequences of microservices
    • Design, implementation more challenging
    • Lookup, liveliness, partitioning, latency, consistency ..
  • Are we doing microservices just to achieve the encapsulation that programming languages already gives us?
  • Path to microservices:
    • Master modularization first
      • otherwise you'll end up with tightly coupled services
      • Bounded Contexts as jump-off point for modularization
    • Forget about layers
    • Re-think DRY - avoid deployment dependencies
  • Interfaces:
    • Should be able to update services independently - plan for interface evolution
    • Postel's Law: "Be liberal in the tnings you accept, be strict in the tnings you offer"
      • validate what you get from other services, don't blindly accept responses
    • Can wrap a bounded context (cluster of services) with an API gateway
    • Synch vs asynch (request/response vs event-driven):
      • synch not necessarily simpler, b/c timeout management
      • ends up with wildly different designs
  • Datastores:
    • avoid single, big DB
    • avoid distributed transactions - re-think design if you find you need them
    • relax temporal constraints, aim for eventual consistency - make actions idempotent ...
  • Production readiness:
    • Need to solve: configuration, orchestration, discovery, routing, observability, resilience
    • Error handling: throwing an exception doesn't work anymore. Separate error (escalation) and control flows.

Wrap-up: focus on functional design and production readiness!

Disque

  • Redis roots: in-memory, optional persistence, same protocol, same license
  • Developed b/c Redis was used as queue, which is was never designed for
  • Use cases
    • async job execution
    • microservice bus
    • distributed scheduler
  • API - same structure as Redis
    • ADDJOB queue job <timeout> ..
    • GETJOB q1 q2 ...
    • ACKJOB id1 id2 ...
  • At least once (default)
    • Liveness: eventually msg will be delivered
    • Safety: msg not yet delivered at least once will never be evicted
      • but TTL
    • Tries hard to be exactly once
  • At most once
    • No liveness guarantees
    • Safety: dequeued msg will never be queued again
  • TTL: some messages make no sense to deliver after some period of time. Can also be used for high-throughput events with a short TTL (drop events if not handled)
  • Supports delays
  • Replication (synchronous): default cluster factor three
  • ASYNC: asynchronous replication
  • Persistent (optionally): append-only file. When restarted, can load state
  • CAP
  • Main sacrifice: best effort ordering
    • Disque can break ordering by re-delivering msg
  • Bad messages that cause workers to crash
    • NACK: alternative to a dead letter queue
    • GETJOB exposes two counters - worker can decide if a msg is kosher or should be dropped based on earlier events

Talk about internals, how cluster decides who delivers messages.

  • Clients that are smart can check if connected node is one that it has not seen many messages delivered from and move to another node that is "closer" in the topology
  • Demo: consumers adjust automatically to speed of producer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment