Skip to content

Instantly share code, notes, and snippets.

@mrflip
Last active December 12, 2015 08:09
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mrflip/4742152 to your computer and use it in GitHub Desktop.
Save mrflip/4742152 to your computer and use it in GitHub Desktop.
MongoDB talk overview

Outline

  • Cube is awesome

  • Scaling is hard

    • 100 writes, easy; 6000/s, ouch

      • tuning and replication!?
      • no, use it right
    • Most of these problems are not "improve by 20%"

    • these are "no fucking way" vs "eh, works ok"

  • See through the magic

    • measure all the things
      • htop
      • mongostat
    • grok all the things
      • pyramidal aggregations:
        • some are pyramidal (sum, min, max)
        • some aren't (distinct, median)
      • metric / event lifecycle
        • events
          • events come in
          • add corresponding metrics to invalidation queue
          • grow indefinitely
        • metrics
          • query comes in from ui
          • parse into corresponding metrics
          • recurse through pyramidal aggregation
            • calculate missing metrics along the way
            • get events from db, bucket by time, apply aggregation
            • save metrics back to capped collection
          • perform arithmetic to combine metrics
          • return requested metrics
    • test most of the things
  • Now what?

    • Uh oh, memory and cpu use are through the roof

      • Way duplicate calculation of metrics
        • you and I both load a dashboard
        • it independently calculates metrics for us both
      • solution
        • cube patch, yay, easy
        • global queue makes everything coordinated
      • problem
        • non-pyramidal metrics are really inefficient
        • they can't be recalculated using exisiting metrics
      • answer: drop them for now (we found a solution later)
    • Uh oh, memory and cpu use are still through the roof

      • problem
        • mongo seems to be working, but resident memory is high and not performing well enough
        • mem-map files
        • 1tb data served by 4gb ram == problem
      • answer
        • swap archival of metrics and events
        • metrics are smaller and more sparse than events
        • our use case can survive on 2 hours of events
        • streaming data doesn't have to be comprehensive for old records
        • implementation - cap events, uncap metrics
          • add warmer
        • maintain calculated metrics
        • continually "watch" existing graphs from dashboards - only need to warm the 'tensecs'
          • add horizons
        • refuse to recalculate old metrics
        • drop old events
          • they should be new anyway
          • store all minimal (10s) metric intervals, so larger intervals (1d) can be calculated
          • works best for seldom updated queries
    • Inserts stop every 5 seconds, huh?

      • problem
        • working working working ANGRY ANGRY ANGRY working working
        • insert insert insert update update update insert insert
      • answer
        • mongo global locking means metric updates can only occur when event inserts aren't
        • oh, so metrics are no longer capped, we'll just remove metrics instead of invalidating (updating)
        • doesn't help
          • at all
          • hmm, locks cause problems
        • database locks just came out
        • split metrics and events into separate databases
        • metrics can update and events can insert without conflicting locks
    • native mongo client

    • So much happier now, but what about non pyramidal aggregates?

      • problem
        • to use metrics to calculate larger intervals, they need to be pyramidal
      • answer
        • store hash maps (keys: values, values: counts) in metrics, so they can be vivified and aggregated
        • smaller footprint than full events
        • works well for dense, finite data values
        • not so good for sparse values
        • be careful!
    • more collectors

      • node, while concurrent, is not multithreaded
      • mongo also single-core on writes
      • we just launched four copies of the daemon
      • gave each DDS node its own prot to write to
      • the optimal way was
        • make an invalidation API (don't invalidate four times)
        • fork child processes instead of just launch four (but then can't use runit)
    • this is where you finally start traditional "tuning"

      • scalability, not performance
      • ordinarily, we'd just scale out. You're in a world where you can double your performance any time you want
  • What didn't work?
    • Mongo-side Aggregation didn’t help! 
(check for email)
      • it wasn't significantly better
      • would have required a rewrite of
    • different queuing strategies
    • update w/ "ignore" vs. delete even in uncapped
  • More on system metrics

  • In case you're wondering

    • faster cores is better than more cores+more total CPU
    • EBS is fine (TODO: verify) -- provisioned IOPS is probably smart.
    • m2.2xlarge: < $700/mo, 34.2 GB ram, 13 bogo-hertz
    • (other tunables)

Theme:

Halp! Database on fire!

  • optimize database tuning!
  • replication!
  • more replication!

No:

  • use it right

Business Problem

There is such a thing as "BI Analytics Tool". This software is not that. It just shows you the data, in flight, as it goes through

This was an active product question -- we didn't know the answer -- people would look at it, say "gee that's ok, it doesn't do X and it won't ever do X but the thing it does is something we need ...".

Now there was this question: would they, (as soon as those monthly payments started, (on the enterprise sales model with huge up-front investment) say, "... well, except can you make it also do X?" ... and the next thing you know we are a company building a bad product badly.

Why dashboard

Etsy found out:

  • if there is an unexpected outage people will bitch about it on the forums afterward.

  • if there is an unexpected outage people will bitch about it on the forums BEFOREHAND.

    • so um people who knit can also travel through time?
  • if there is an unexpected outage people will DETECT IT BEFORE YOUR MONITORS WILL.

    • think about a server with a 60-second socket timeout.
    • ... or one that is performing at 1.999 sigma
    • or is flapping. leading to one out of five requests failing. and your monitor starts complaining if X records go bad in Y time

This is a really really important insight. And it has an obvious business implication:

if forum traffic spikes, page the right people
that way they can get to a computer
by which time the failure will have actually manifested

and it was face slappingly obvious as soon as you drew the one graph next to the other.

How Cube works

TODO: HH

Events and Metrics

  • "Events"
  • "Metrics"

Pyramidal Aggregation

Path to Most Harmonious Enlightenment

measure all the things

TODO: add http://memegenerator.net image

Much more later, but so that we can show you them,

  • htop
  • mongostat

... and grok all the things

  • assemble a model of what the world looks like

    • mem-mapped datastore(*)
    • capped collections

Those have implications for what we see as debugging

  • pyramidal aggregations
  • event/metric lifecycles

Took the example and made a "cromulator", something that could replay a cromulent event stream (i.e. sine wave, random noise, ramp)

"Ok so if I write events on this narrow time range then I will see only these metrics get re-calculated."

Understand how queries are stored as metrics: how are metrics identified? queries break down into component metrics based on corresponding events how are metrics calculated? metrics are combined with constants using basic arithmetic on the fly "So what happens if I multiply by a constant in the dashboard" (show example of a dashboard that doesn't re-calculate, it's done on the fly. Simple arithmetic not stored. HH: "Computers are really good at that")

"what happens if I go in manually and remove all the records?"

TODO: there was one more, maybe we care.

On being a mem-mapped datastore: I like this thematic of "trust the OS". See: http://kafka.apache.org/design.html, and

"This style of pagecache-centric design is described in an article on the design of Varnish here (along with a healthy helping of arrogance)." http://varnish.projects.linpro.no/wiki/ArchitectNotes

S

database was always unhappy

  • Inserts were fine
    • inserts always go to next slot
  • Reads, not so much.
    • reads had to hit larger range ("last week")
    • we were storing everything

Gee, if you try to serve X TB of data from a machine with X GB ram, you're going to have trouble.

Answer: Capped collections

Now cube already used a capped collection for metrics

  • metrics is simply a persistent cache; can always re-create (efficiently) from events
  • events were unbounded
  • request a graph, tweak a graph, etc and if it isn't there (because aged out or never made) will be computed.

Don't solve hard problems -- turn them into an easy problem, solve that instead

So we capped the events table

We gave up ability to ask new questions of old data. But!! the Business need explicitly stated we don't need to have that.

  • cap events table
  • uncap metrics
  • But if someone isn't looking at the graph, it won't be requested
    • did we make something that re-calculates?
    • no, we just built a robot to look at every graph (hooks in the same way the HTTP layer does).

TODO: diagram here.

worky worky worky ANGRY ANGRY worky worky worky

Here is the mongostat trace. since you might not stare at these all day, what this implies is the following graph:

worky worky worky ANGRY ANGRY worky worky worky

  • invalidation was causing burst load on the datastore
    • worky worky worky ANGRY ANGRY worky worky worky

Locking during update

Echo Chamber

Fix in cube master

Things to know about DB

Why are capped collections nice?

  • if it's capped,

    • files live contiguously on disk (allocated at same time!)
    • so if I look up element A, I'm likely to look up element B
  • writes just go in order along the collection; not a hair out of place

    • Capped collections work in a way similar to circular buffers: once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection

View raw

(Sorry about that, but we can’t show files that are this big right now.)

This file has been truncated, but you can view the full file.
View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

View raw

(Sorry about that, but we can’t show files that are this big right now.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment