threepointone/rethinkdb-caches.md

## rethinkdb-caches.md

      
    Raw
  

              rethinkdb-caches.md
            
          
    better caches with rethinkdb

TL;DR - smelly software engineer discusses using rethinkdb changefeeds for building caches, breaks hearts, shaves the cheerleader, shaves the world.
Let's talk about caches.
Imagine that you build UIs for an ecommerce company, possibly in a fancy office with free coffee and whatnot. You've just been asked to build a way for the marketing / sales folks to change landing pages whenever they're running campaigns. After a number of angry discussions involving the ux team about what they can and cannot change, you settle on a 'document' format for these pages. It could be json describing a tree of widgets of banners and carousels, or html, or yaml, or whatever. Maybe you also invent a dsl that marks out parts of the document as dynamic, based on request parameters or something. I dunno, I'm not your boss. You build a little ui over the weekend (with react? maybe!) that lets these folks login, drag and drop their banners, maybe upload an image or two, and save to database.
You then get to work building a server that your website/server/app is going to hit through the day. This thing's lean, and does just a few things -

exposes a GET endpoint
that matches a :key of some sort
and pulls the matching document from to a database
then transforms this document
and spits out the value

So far, so good. However, you still need to add a cache to this layer, because having a bajillion hits on your single cheapo database will burn your datacenter down (Or so I'm told). But hey, 'caching', can't be too hard, no?
&^%%#$^%
As a first approach, you decide to just throw varnish on it and call it a day. However, this doesn't work, because in step 4, you have logic that hides/removes some banners based on some session data specific to the user (an onboarding banner perhaps). So regular http caches are not an option. We need to go deeper.
You then decide to implement a simple cache. Maybe it looks like this -
import LRU from 'LRU';

const cache = new LRU({
  ttl: 10 * 60 * 1000  // 10 minutes
});

async function actuallyFetch(key){
  // talk to database, etc
  return doc;
}

async function client(key){
  if(cache.has(key)){
    return cache.get(key);
  }
  let doc = await actuallyFetch(key);  
  client.set(key, doc);
  return doc;
}

// client('homepage').then(layout => /* etc */)
This is much better, and reduces chatter with the database. However, it's still not perfect! For ecommerce companies that start with 'M' and end with 'yntra', at the scale that requests come in, it's very likely that await actuallyFetch(key) would not have returned and set the cache, resulting in spikes in chatter whenever the cache expires. Bah. These are called "cache stampedes", which is a cool phrase to say at conferences and sound important. You take another crack at it -
import LRU from 'LRU';

const cache = new LRU({
  ttl: 10 * 60 * 1000  // 10 minutes
});

const promises = {};

async function client(key){
  if(cache.has(key)){
    return cache.get(key);
  }
  if(promises[key]){
    return await promises[key];
  }
  promises[key] = actuallyFetch(key); // shared mutable state aaaahh!
  let doc = await promises[key];
  client.set(key, doc);
  promises[key] = null;
  return doc;
}

async function actuallyFetch(key){
  // talk to database, etc
}
Better! You can run a moderately sized ecommerce company with this and maybe get an acquisition offer from a competitor running a php stack which turns out great and they have tons of funding at least till the tech bubble bursts and everyone goes back to writing real letters to their spouses and...
Anyway. A problem arises.
You can't run time sensitive campaigns, and constantly fight this 10 minute period. (eg - if you need to immediately take down a banner because you just ran out of inventory, etc). 10 minutes turns out to be an eternity and a many-zeroes-number in terms of revenue loss if something goes wrong with a page, or even a page that a link from this one leads to, etc.
Even worse, as you add more servers and more keys, you'll end up having to increase the ttl to decrease chatter to the database. Polling. whatever - nothing really works to fix this. Not a good way to scale.
As an example, around a year ago we were doing about 0.5 - 1 requests / sec to our free account on Parse. We now routinely cross the 30 requests / sec limit (and have since moved to a paid account). Increasing ttl leads to angry calls from the salesops teams, while decreasing ttl makes for a lot more chatter. Boo.
Now let's say you were smart and used rethinkdb as your database.
rethinkdb has a pretty killer feature called 'changefeeds' that let you hook into a database, and have it stream mutations to you as they happen. By maintaining a cache that 'reduces' on every event from this change feed, you can then maintain the mythical 'always warm, always fresh' cache that your grandaddy spoke of.
// assume that each doc in table `layouts` has a primary `id` and `key`

const cache = {}; // plain old object

// update cache every time the db changes
db(arr => arr.forEach(doc => cache[doc.key] = doc))

async function client(key){
  return cache[key]; // ???!!
}

// everytime the actual db changes, this spits out the whole table 
function db(callback){
  // first fetch the whole table and send it back
  let table = await getTable('layouts');
  callback(table); 
  // then on every event, update the table 
  getChanges('layouts', diff => callback(table = consume(table, diff)));
}

// reducer that reconstructs the table for every change 
export function consume(arr, diff){
  return diff.old_val ?
    (diff.new_val ?
      arr.map(el => (el.id === diff.old_val.id) ? diff.new_val : el) :
      arr.filter(el => el.id !== diff.old_val.id)) :
    [...arr, diff.new_val]);
}

async getTable(name){
  // get the whole table 
}

getChanges(name, callback){
  // listen for changes on table
}
ZOMG that's it?! This... works! It doesn't stress out the servers too much (other than the one persistent connection per server process), and you can run campaigns that trigger immediately. There's a bunch of hand waving over when the connection breaks, error handling, implementing a forceful cache invalidation, etc. but in general, much win!
Also, Bruce Willis was dead all along.