Skip to content

Instantly share code, notes, and snippets.

@timblair
Created November 23, 2012 12:56
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save timblair/4135496 to your computer and use it in GitHub Desktop.
Save timblair/4135496 to your computer and use it in GitHub Desktop.
Notes from All Your Base 2012, 2012-11-23

AYB12

Alvin Richards: MongoDB

  • Trade-off: scale vs. functionality. MongoDB tries to have good functionality and good scalability.
  • Auto-sharding to maintain equilibrium between shards
  • Scalable datastore != scalable application: use of datastore may still be non-scalable (e.g. many queries across all shards)
  • Get low latency by ensuring shard data is always in memory: datastore then becomes a cache with persistence
  • Replica sets: auto-election of new primary node on failure, plus automatic recovery once failed node is back online
  • Async replication between nodes in a replica set (eventual consistency)
  • Auto TTL for messages, and can update on read operations
  • Tunable data consistency before write is "complete" from "none": fire and forget, assume it's going to get there eventually, to "full": includes remote replication to other geographies
  • Data model of RDBS enforces relational model which can limit ability to scale that system. Data locality ("which server is my record on?") becomes an issue

Luca Garulli: OrientDB

  • Biggest issue with switching from RDBMS: what about the data model?
  • KV, column-based, document DBs ... and graph DBs
  • Property graph model: vertices and edges can have properties, edges are directional, edges connect vertices, vertices can have one or more incoming + outgoing edges
  • In RDBMS, every time you traverse a relationship, you perform an expensive JOIN. Indexes can speed up reads, but slow down writes
  • Index lookups are generally based on balanced trees. More entries == more lookup steps == slower JOIN
  • "A graph DB is any storage system that provides index-free adjacency"
  • A graph DB treats relationships as physical links assigned to the record when the edge is created; RDBMS computes the same relationship every time you perform a JOIN
  • Lookup time moves from O(log N) to new O(1), and does not increase with DB size
  • NuvloaBase.com: REST-based graph DB service
  • Difficult to create distributed graph DBs. Scaling is basically a case of using client-side hashing.

Dale Harvey: PouchDB

  • CouchDB for JavaScript environments, mainly for browsers (but also works in Node.js)
  • Multi-master replication, supports disconnected sync
  • "Ground computing" -- like cloud computing, but provides offline behaviour with on-demand sync
  • Designed for builing applications that needs to work well offline, and that need to sync data
  • Would simplify something like multi-app SimpleNote-type system?
  • Offline is a fact: the more mobile devices, the more people are offline. No reception data limits, slow / unstable connections etc
  • Sync is hard: Things took 2 years to develop sync
  • Bad connections + retries, transfer overhead and moving deltas (mobile access might not want total sync), master-master scenarios, conflict resolution
  • [CP]ouchDB has good, simple conflict resolution, but sometimes you need to tell it what to resolve (based on your app usage)
  • Requires CouchDB on the server for sync
  • Safari + Opera support in progress, so not production-ready yet

Matt Heitzenroder: Eventual Consistency

  • Brewer's Conjecture (2000): CAP -- you can only have two
  • "Life is full of tradeoffs" as is engineering
  • Amazon's Dynamo paper: tradeoff between C & A -- they chose A
  • Financial systems already dealing with eventual consistency: trading banks closing and reconciling, network partitions between cash point and centralised bank etc
  • Riak uses vnodes in a ring topology (ketama-style)
  • Writes go to hashed node + the next two (i.e. three copies on separate nodes)
  • Read Repair: handle out of date copies of data on vhosts automatically on read and update out of date nodes to logical descendants (e.g. v1 -> v2)
  • Read Repair etc means internally three objects are requested and checked for consistency. This can be tuned via quoram, single-read for speed etc
  • There can be divergent ojbect versions, a.k.a. siblings: after a network partition, two operations can have altered object state at the same time. Riak returns both versions
  • Per-application, can define a "conflict resolver": as part of the Riak client to define how to handle sibling resolution
  • Common use-cases are: pick one based on some property, or perform a set union of the data
  • Probabilistically Bounded Staleness

Monty Widenius: MySQL-MariaDB

  • MySQL named after Monty's daughter, My (MaxDB released later, named after his son, Max)
  • Original MySQL devs started focussing on MariaDB in 2009 with the impending purchase of Sun by Oracle
  • Chose to use dual-license to be able to work full-time on MySQL: took 2 months to become profitable
  • Don't go to investors when you need their money. Wait for them to come to you when you don't need their money, and you won't have to give up so much of your company
  • Monty Program Ab: new company (using Hacker Business Model) to focus on MariaDB, with most of the original MySQL developers
  • Aim to keep MySQL dev talent together, always have an open-source version of MySQL. More important after Oracle purchase of Sun
  • MariaDB is a drop-in replacement for MySQL. "No reason to use MySQL anymore: MariaDB is better in all cases"
  • Big JOIN and subquery performance is an order of magnitude (or more) faster than MySQL
  • "SQL doesn't solve all common problems" e.g. arbitrary attributes (shop item sizes, colours etc). Dynamic columns introduced in MariaDB 5.3. As a POC, created a storage engine for Cassandra with MariaDB 10
  • Any close-sourced features that Oracle has added to MySQL have been added to MariaDB as open-source features
  • 5.5 introduces a new thread pool (instead of thread-per-connection)
  • Full merge of MySQL 5.6 into MariaDB 5.6 is a year-long project due to broken features and new bugs, over-complicated vode, lack of understanding of existing code etc
  • Did such a good job of getting the MySQL name out there, changing everyone over to MariaDB is going to be a tough job!
  • Though creating a dev community is easier as Oracle is not working with the community
  • Aim of MariaDB: make MySQL obselete
  • Free MariaDB + MySQL knowledgebase available at askmonty.org

Brandon Keepers: Git: the NoSQL DB

  • Let's start with "Git's amazing ... what else can we do with it?"
  • "NoSQL is marketing bollocks" -- people mean non-relational and schemaless, and anything else gets lumped in to NoSQL
  • git calls itself "the stupid content tracker" (see the man page)
  • git has three "object types": blobs, trees and commits, plus symbolic "references" on top, all managed by the git command line tool
  • There are libraries to work with this (Grit, libgit2), plus ORMs built on top, such as ToyStore
  • NoSQL allows us to question RDBMS design, including big design up-front: schemaless allows us to be much more agile with our data model
  • git can handle transaction in both short-lived (one commit with multiple changes) and long-lived (branches) forms
  • Replication handled by the fact that all git repos are full clones
  • git doesn't have any of the features that makes a great DB: querying, concurrency (it's filesystem based), merge conflict resolution, scale
  • Scale: filesystem based, and problems with git at scale. Someone tested with a very large repo: 4m commits, 1.3m files, 15Gb repo ... git-add took 7 seconds etc...
  • Think about how you can abuse your tools to get more out of them

Peter Cooper: Redis, Steady, Go!

  • Peter's a Rubist, and wants his languages and tools to be "beautiful," which he considers Redis to be
  • Redis: remote [data structure] server -- no tables, no SQL, no enforced relationships, lots of working with primitives. The Redis manifesto calls it a DSL for abstract data types
  • Like memcached but with more commands, more persistence, more data types
  • Three big use cases: database, messaging (pub/sub, queueing), or as a cache. Also: fast live stats logging (why Redis was created in the first place), rate limiting (using automatic key expiry), scoreboarding (using sorted sets), IPC, session storage
  • YouP*rn.com uses Redis as their primary datastore (~100 Alexa ranking)
  • Redis is single-threaded and event-driven (apart from background saving etc). Single-threading means individual operations are atomic
  • Python library redis_wrap means you can use normal Python data types, backed by Redis
  • Recent additions: scripting with LUA, plus PostgreSQL data wrapper
  • 6 data types: strings, lists, sets, sorted sets, lists, hashes
  • Abstract data type example: queueing using a list, with LPOP and RPUSH. Priority queues implemented by using a BLPOP with multiple list names
  • Set operations are available such as intersection, union, difference. Also provides the ability to store intermediary results in new keys.
  • Hashes don't allow storage of other data types: strings only
  • Supports transactions using MULTI ... EXEC to run all queued commands in one go
  • Master/slave replication is simple with the SLAVE OF ... command
  • Other updates and versions include Redis Sentinel (in development to provide automated failover), Redis Cluster (in development for fault tolerance of a subset of Redis commands) and a Windows version
  • Have a play with a "live demo" within the redis.io documentation

Lisa Phillips: MySQL @Twitter

  • MySQL plus friends has enabled Twitter to still use MySQL (5.0 and 5.5) as its primary datastore, with an average off 400 million tweets per say, 4,629/s average, with a peak at 25,088/s (about a Japanese anime film!)
  • 8 full-time DBAs (recently up from 6) managing thousands of MySQL instances, supporting 100s of developers. All DBAs have at-scale experience, and most developers are familiar with MySQL
  • The Twitter DBAs manage from the bare-metal up, including operating system, software, monitoring etc
  • Engineering in Twitter is about pragmatism: use commodity hardware and software, queues and async processing, eventual consistency, some delay tolerance (measured in seconds)
  • "Build new awesome tools (and open source them) if you need to"
  • They use "deciders": feature flags to enable roll-out to small volumes of people to gauge impact on the DB servers (plus other parts of the infrastructure)
  • Twitter don't roll back, either code or DB changes: they roll out slowly and iterate on any fixes
  • Replication (usually) works. Have seen replication break in lots of different ways so many times, so can now quickly fix any problems.
  • Bad points of MySQL: at-scale ID generation, graphs, replication inefficiencies and lag
  • "If you're using replication, make sure you can tolerate lag in your code. If you can't tolerate lag, don't use MySQL"
  • MySQL great for HA, "smaller" datasets (<1.5Tb)
  • Challenges: MySQL version diversity, single DBA, upgrades without HA solution, no load-balancing for reads
  • In 2012, they used a sharded master-slave setup using temporal sharding. New shards were hot, old shards not. New DB clusters being built every week, and DBA time became limiting factor
  • Snowflake used for unique ID generation
  • Gizzard created for sharding as a replacement for the temporal sharding (stores and replicates tweets, interest, social graph) and replaces native MySQL replication (disabling native replication improves performance)
  • Gizzard handles 6m SELECTs per second at peak, and creating more than 3b records per day
  • Other apps built on top of Gizzard: Flock, TBird, TFlock -- all of these are backed by MySQL
  • Still using traditional master-slave clusters (3-100 machines in a cluster) for non-tweet data such as user metadata, old Rails models
  • One Twitter employee is an ex-MySQL developer who now just works on MySQL for Twitter
  • Working on better loggin and auditing support, real-time monitoring, performance and response metrics, row-based replication pre-fetching

Brian LeRoux: Mobile Web Persistence

  • Lawnchair: like CouchDB but smaller and outside
  • "Xcode is Eclipse that looks like iTunes, and it just as slow"
  • Cookies: need to be online, 4Kb storage, but there's a handy hack for serving up responsive image
  • Can I use Web SQL Database? (oh, and JS is dumb)
  • SQL in the browser: SQLite (probably). Started off as Google Gears, but now improved. http://caniuse.com/sql-storage SQLite is an implmentation not a standard, and Mozilla had issues with it. Isn't everywhere and doesn't necessarily work.
  • LocalStorage: quite nice, can store up to 5Mb, but has a synchronous (blocking) API, plus misses complex types, and you can't query it. Almost supported everywhere (e.g. Opera Mini)
  • WebSimpleDB: solve ALL the problems! Renamed Indexed DB. Has querying etc, but is heavy on the code required because it's a versioned DB. Not supported in lots of places (yet), but could be polyfilled.
  • Lawnchair wraps up all the above in one sane API
  • Hack: store unlimited data on all browsers, accessible from any domain, using window.name
  • Web sockets means we could have a web page open a database connection
  • WebRTC PeerChannel and DataConnection APIs are also around
  • Stong indication of first-class File APIs coming to browsers. Currently split in to two specs: File API and Directories and System API. filer.js tries to make this saner
  • Mozilla working on Archive API in Firefox OS

Craig Kersteins: Postgres Demystified

  • postgresapp.com -- simplified running of Postgres in OS X
  • "It's the emacs of databases": more of an OS for your data
  • psql is powerful command-line client
  • 30+ datatypes including IPs, MAC addresses, geospatial, arrays
  • Native arrays give the power of custom fields without a join
  • Loads of extensions such as hstore: a KVP store inside a column, with queryability
  • Simple native JSON type recently added, but PLV8 allows embedding of a JS engine (can open up JS-injection attacks!)
  • Range types include from+to within a single column, and can have checks on those (e.g. two entries can overlap in times)
  • Light geospatial stuff is built-in; PostGIS provides full geospatial capabilities
  • Sequential scans are bad (most of the time). Indexes are good (most of the time)
  • Postgres has multiple types of indexes. B-Tree is the default, and you usually want it; BIN used with multiple values in one column (arrays, hStore); GIST for full-text search and GIS
  • Aim for all queries being =< 10ms
  • Can create indexes concurrently without locking the whole table
  • Create indexes on certain conditions (e.g. active things only)
  • PG internal metrics can provide things like cache and index hit ratios
  • Window functions permit partioning (sub-grouping) data while querying
  • Fuzzy string matching using soundex()
  • Move data around using \copy or db_link, not SELECT + INSERT
  • Foreign storage adapters such as Redis: in this case can JOIN across PG and Redis
  • Common table expressions allow naming of common queries, which can then be reused in subsequent queries
  • Extras: Listen/notify (pub/sub within the DB), per-transaction synchronous replication, SELECT for UPDATE
  • Replication introduced in 9.0, multi-master expected for 9.4
  • References: Postgres Guide and a presentation

Tim Moreton: Apache Cassandra and BASE

  • Facebook took bits of BigTable data model and Dynamo distribution to create Cassandra, used to power their inbox search. Open sourced it in 2008, top-level Apache project as of 2010. Now pretty prevalent
  • Multi-master (no SPOF), tunable consistency (multi-DC aware), optimised for writes (do more up front to gain on select time), atomic counters
  • Data model is a set of nested, sorted dictionaries. Columns are effectively just labels, and can be very wide
  • Reads are fast within a single row (across columns) but much slower between rows, because rows are spread around the cluster
  • Uses timestamp-based reconcilliation for conflict resolution across the cluster
  • Tunable consistency for both writes and reads: one, quorum, all
  • Use case: session store. Read dominated, updates to existing items, probably fits in RAM, distribute for availability, challenge: atomicity
  • Use case: real-time analytics. Write dominated, updates rare, read "results" mostly, distribute for availablity + performance + capcity, challenge: complex querying
  • Twitters promoted tweets dashboard just used Cassandra counters, demormalising into buckets on writes, so the grouping etc is already done for reading (no need for separate counting, grouping etc)
  • Relies on up-front knowledge of the use of the data to be able to optimise for reading
  • Acunu Analytics: materialised views of data to provide better queryability on top of Cassandra data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment