Skip to content

Instantly share code, notes, and snippets.

@ryankennedy
Created January 10, 2012 06:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ryankennedy/1587490 to your computer and use it in GitHub Desktop.
Save ryankennedy/1587490 to your computer and use it in GitHub Desktop.
Scalable Application Specific Databases with Berkeley DB Java Edition
Title: Scalable Application Specific Databases with Berkeley DB Java Edition
Short Description (400 chars): In 2011 Yammer replaced a creaking 10B row messaging
database with BDB JE. This improved our availability, simplified scaling and lowered
delivery latency. I will discuss evaluating whether BDB JE is right for your situation,
problems you may run into with along the way and useful patterns for leveraging BDB
JE as an embedded solution to building reliable application specific databases at scale.
Full Description (a few paragraphs):
Most people think of Berkeley DB as a simple, on disk, B-tree based key/value store.
Few realize there's a Java version. Fewer still realize there are reliable replication
and leader election functions built in. These capabilities make it possible to implement
a simple, reliable and scalable embedded application specific database. Oracle
themselves recently released their Oracle NoSQL Database, which is built on top of
Berkeley DB Java Edition (BDB JE).
In 2011, after ruling out other technologies and hardware upgrades, Yammer replaced a
creaking 10 billion row PostgreSQL messaging database with a home grown distributed
database cluster built atop BDB JE. In the process we improved system availability,
simplified scaling and lowered message delivery latency. Our operations team sleeps better
at night knowing we have N+1 redundancy.
We chose an embedded solution after evaluating several client/server stores. In the end,
our high write fanout on message delivery (imagine an SMTP server attempting to deliver 1
email message with 50,000 addresses in the Cc header) made it increasingly expensive to
perform over the network. An embedded solution enables this fanout to occur in-process with
writes dispatched to a local filesystem.
I will discuss evaluating whether BDB JE is right for your situation, problems you may run
into along the way and useful patterns for leveraging BDB JE as an embedded solution to
building reliable application specific databases at scale. While familiarity with b-trees,
replication and leader election will be helpful, none of what I discuss will dive into
territory that requires extensive knowledge of databases or distributed computing.
Copy link

ghost commented Jan 11, 2012

Very cool talk. I look forward to hearing about it.

@ryankennedy
Copy link
Author

I look forward to (hopefully) giving it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment