Skip to content

Instantly share code, notes, and snippets.

@dblandin
Created June 30, 2016 15:26
Show Gist options
  • Save dblandin/cad23ebb40838001b9773d356995275d to your computer and use it in GitHub Desktop.
Save dblandin/cad23ebb40838001b9773d356995275d to your computer and use it in GitHub Desktop.
MongoDB World 2016 Notes

MongoDB 🌎 2016 Notes

Tuesday, June 28th

Morning Keynote Session

Eliot Horowitz, CTO, MongoDB

Exciting features in MongoDB 3.4

  • Split-storage replica sets
  • Recursive graph lookup
  • Faceted search
  • Compass data exploration tool
  • $lookup (like joins)
  • Read-only views

Ross Mauri, General Manager, z Systems and LinuxONE at IBM

LinuxOne + Mongo UK Schools ibm.com/linuxone/mongodb

Containerizing MongoDB with Kubernetes

Dan Worth, Director of Engineering, fuboTV Brian McNamara, Founder, CloudyOps, LLC

Tagsets Stateful services

Mongo data should persist

Label selectors Persistent volumes in pod definitions / Replication Controllers

MongoDB service endpoint

FuboTV blog post

Scaling MongoDB w/ Docker and cgroups

[Marco Bonezzi][], Technical Services Engineer at MongoDB [Marco Bonezzi]: https://twitter.com/marcobonezzi

Deployment -> Orchestration Using predefined cluster patterns. Replicating environments.

Resource Control -> Resource Management Settings limmits to key resources

MongoDB & Docker -> Automate for Scaling Create once, deploy everywhere Deploy patterns, not processes

Docker usage survey > Why are companies interested in docker?

Docker Machine / Swarm / Compose

Blog post: Evaluating container platforms at scale

Using affinity filters in docker swarm to prevent replica set members on the same instance.

Resource control with cgroups

Wired Tiger memory split in two WT cache + MongoDB memory (connections, aggregations, map reduce, etc)

Mongo process does not see cgroup limits so Wired Tiger cache should be set explicitly.

docker top rsa1 / docker stats rs1

Combine docker metrics with mongo metrics

Creating a Swarm cluster on AWS to deploy MongoDB

  • Configure docker-machine with ec2 driver (AWS)
  • Deploy discovery service for Swarm Master
  • Deploy AWS instances for swarm master and swarm worker nodes
  • Define compose file for deployment
  • Define swarm filters and constraints and cgroup limits
  • Connect to the swarm master
  • Define Sward filters and constraints and cgroup limits
  • Connect to the Swarm master
  • Deploy the environment with a single command using the compose file
  • Configure our MongDB sharded cluster using Cloud Manager API
  • Demo!

Smart Strategies for Resilient Applications

[A Jesse Jiryu Davis][], Staff Engineer at MongoDB [A Jesse Jiryu Davis]: https://twitter.com/jessejiryudavis

Links: screencast, resources

MongoS

Smart retry strategies

SDAM: Service Discovery and Monitoring

Bad retry strategies as they apply to:

  • Network blips
  • Primary failover
  • Network down
  • Command error

Make your operations idempotent.

Non-idempotent insert ($inc)

  1. Add unique token.
  2. Remove token and increment counter.
  3. $addToSet
  4. $pull, $inc

Eventual consistency after a network outage.

Appropriate for high value, infrequent updates when trade off for additional load/latency is warranted.

Otherwise, just use $inc, and accept possible count misses.

Black Pipe Testing / MockupDB

bit.ly/resilient-applications

Ops Manager API, Puppet & OpenStack - Fully Automated Orchestration from Scratch!

Naama Bamberger, Software Developer at Cisco

MongoDB

  • Easier to perform updates
  • Forwards and backwards compatible

Automated deployment of MongoDB

Ops Manager?

OpenStack, HEAT, Puppet, Python for tooling

OpenStack is a virtualization solution. AWS is one of many crowd built on top of OpenStack. Began in 2010 by Nasa and RackSpace.

OpenStack services: Compute/Networking/Storage

Orchestration service. Yaml confit describes deployment requirements.

Create VMs for each component.

OpenStack UI OpsManager UI

Automation agent

Goal state = ready to use.

Multiple data centers

On secondaries: Deploy agents Create deployer From deployer - Access Ops Manager in primary to extend cluster

Third data center with one arbiter.

When a fallback occurs, create a new machine with an arbiter so a new primary can be elected.

Sef / scale.io storage layer

Evergreen: The Life of a MongoDB GitHub Commit

[Kyle Erf][], Software Engineer, MongoDB Shraya Ramani, Software Engineer, MongoDB [Kyle Erf]: https://twitter.com/KyleErf

https://github.com/evergreen-ci/evergreen

Evergreen / In house continuous integration system

MongoDB is not a typical use case for CI

Most users have a good idea of how their product will be used.

Tests would take 20+ hours serially on one machine

Supported on multiple platforms.

MongoDB is tested on 50 different variants.

600 hours of computer time to run tests on all variants.

Used build bot before which couldn't scale enough.

Evergreen autoscales testing hardware to meet commit traffic.

Multi-platform support.

Powerful navigation. Open source licensing.

Components

  • Repo tracker: uses a polling strategy for recent commits.
  • Scheduler
  • Host initializer
  • Agent
  • Task runner

Goals

  • Minimize time in task queue
  • Minimize idle host time

"Job Shop Scheduling" Minimize makespan

Evening Keynote

Oron Gill Haus, Managing Vice President, Consumer Bank Engineering, Retail and Direct Bank, Capital One

Hygieia CapitalOne DevOps Dashboard https://github.com/capitalone/Hygieia

Wednesday, June 29th

Morning Keynote

[Dr. Eric Brewer][], Vice President of Infrastructure, Google [Dr. Eric Brewer]: https://twitter.com/eric_brewer

Kubernetes 1.3 Rolling upgrades with pod labels Names are persistent and resolvable Init hook / recover or initialize state Staggered start

Dr. Hannah Fry, Lecturer in the Mathematics of Cities at the Centre for Advanced Spatial Analysis, University College London (UCL)

https://twitter.com/FryRsquared

Bike share usage patterns, tweet language distribution, and cows in heat!


The Life of a Write in a Sharded Cluster & Config Servers as Replica Sets

Randolph Tan, Software Engineer at MongoDB

Config servers with replica sets. Election process.

Single source of truth. Single server maintains lock.

readConcern readConcernMajority readAfterOpTime

Topics on MongoDB docs:

  • Read concern
  • Replica set
  • Rollback
  • Sharding concepts

Managing Petabytes of Data at Baidu

Beibei Xiao, DevOps Engineer at Baidu

2D geospatial indexes

MongoDB service API

Single point of entry

Quota control Authorization Flow control

Split large database into smaller databases

Create index in turn replication create index Secondary first and primary last Just care of oplog time

Brought up as new mode without replication. Run index in foreground. Add to replica set. [make primary?]. Oplog is synced to others.

Balancer and Migration

Problems:

  • Balanced degrade system performance.
  • Disk space not released to system after migration.
  • Balance speed is not fast enough to when shard number increases.

Solutions:

  • Use hash instead of range shard key to avoid balancing.
  • PreSplit and move chunks
  • Limit balancer running time window
  • Baidu custom balancer script
  • Migrate databases between nodes.

In the Future:

  • Spinning disk / better write performance when not using SSDs
  • Balancer: lighter and faster
  • Data compression.
  • WT engine, document validation.

Coming in MongoDB 3.4:

  • Enable parallel chunk migration
  • Remove migration throttling by default for WiredTiger

MongoDB Rocks MongoDB storage integration layer for the Rocks storage engine https://github.com/mongodb-partners/mongo-rocks

Advanced MongoDB Aggregation Pipelines

Joe Drumgoole, Director of Developer Advocacy, EMEA at MongoDB

Aggregation grew up in 3.0

Cursors are returned from aggregate queries. $out to new collection.

Processing pipeline Design to process large groups of documents in parallel Is shard aware Can create new data from old

Match -> Project -> Group -> Sort

Group: group by, execute accumulators, rename fields.

Geo queries must be first. $out must be the last stage.

Demoed with U.K. Driver and Vehicle Licensing Agency (DVLA) data set

Building WiredTiger

Keith Bostic, Senior Staff Engineer at MongoDB

Top to bottom level searching Higher levels skip more values Singly forward linked list

Atomic increment/decrement

  • Hazard pointers
  • Skiplist
  • Ticket locks

Open source implementations in WiredTiger

~200 lines of code in a btree ~20 lines of code in a skiplist

7x-10x performance bump

https://github.com/wiredtiger

From Story to Document: Modeling Common Business Problems with MongoDB

Nuri Halperin, Principal at Plus N Consulting

Same rules, completely different scale

"Mongorama"

Example: "Maker Space" says the boss / space, creativity, tools, community

So many tables / relationships. Done. Mic drop.

First rule of Nuri's thumb: Key interactions should drive document design

Why is my data scattered across tables? Must all objects be flat? Can we iterate quickly?

Add makership info directly on a person document.

Data that works together lives together.

Instead of application logic to require certifications, put that boolean within the document?

Maybe the application should define and enforce the rules?

Embed immutable data.

You don't get much value from referential schema design.

Should I embed?: Ownership? / Work Together / Bound Growth / Lifetime

Example: Library book checkout system. Card stays with book.

Will I need more and more library cards?

Does the rate of change of the embedded data match the parent document?

Have a separate ledger collection. Add some additional information for maker / tool so we don't have to do a referential lookup to answer questions.

Key interactions:

  • Maker gets verified
  • Borrow and return tool

Aggregation frameworks. Who's the biggest users of tools? Audit report. Aging report.

Let the engine work for you.

Aggregate framework should be used for reports instead of calculating in memory.

You can keep a fixed length array of recent checkouts on the tool itself. Do it either on write or have a background job.

slice:-3

Collections: makers, tools, toolLog

Easier to determine function with these 3 collections than many more flat tables with mostly referential data.

Takeaways:

  • Key interactions drive schema design
  • Data that works together lives together.
  • Embed immutable data.
  • Let the engine work for you

The roles have changed. DBAs no longer set the schema for developers to conform to. Responsibility on the developer side has increased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment