dblandin/mongo-world-2016-notes.md

## mongo-world-2016-notes.md

      
    Raw
  

              mongo-world-2016-notes.md
            
          
    MongoDB 🌎 2016  Notes

Tuesday, June 28th

Morning Keynote Session

Eliot Horowitz, CTO, MongoDB
Exciting features in MongoDB 3.4

Split-storage replica sets
Recursive graph lookup
Faceted search
Compass data exploration tool
$lookup (like joins)
Read-only views

Ross Mauri, General Manager, z Systems and LinuxONE at IBM
LinuxOne + Mongo
UK Schools
ibm.com/linuxone/mongodb
Containerizing MongoDB with Kubernetes

Dan Worth, Director of Engineering, fuboTV
Brian McNamara, Founder, CloudyOps, LLC
Tagsets
Stateful services
Mongo data should persist
Label selectors
Persistent volumes in pod definitions / Replication Controllers
MongoDB service endpoint
FuboTV blog post
Scaling MongoDB w/ Docker and cgroups

[Marco Bonezzi][], Technical Services Engineer at MongoDB
[Marco Bonezzi]: https://twitter.com/marcobonezzi
Deployment -> Orchestration
Using predefined cluster patterns. Replicating environments.
Resource Control -> Resource Management
Settings limmits to key resources
MongoDB & Docker -> Automate for Scaling
Create once, deploy everywhere
Deploy patterns, not processes
Docker usage survey > Why are companies interested in docker?
Docker Machine / Swarm / Compose
Blog post: Evaluating container platforms at scale
Using affinity filters in docker swarm to prevent replica set members on the
same instance.
Resource control with cgroups
Wired Tiger memory split in two
WT cache + MongoDB memory (connections, aggregations, map reduce, etc)
Mongo process does not see cgroup limits so Wired Tiger cache should be set
explicitly.
docker top rsa1 / docker stats rs1
Combine docker metrics with mongo metrics
Creating a Swarm cluster on AWS to deploy MongoDB

Configure docker-machine with ec2 driver (AWS)
Deploy discovery service for Swarm Master
Deploy AWS instances for swarm master and swarm worker nodes
Define compose file for deployment
Define swarm filters and constraints and cgroup limits
Connect to the swarm master
Define Sward filters and constraints and cgroup limits
Connect to the Swarm master
Deploy the environment with a single command using the compose file
Configure our MongDB sharded cluster using Cloud Manager API
Demo!

Smart Strategies for Resilient Applications

[A Jesse Jiryu Davis][], Staff Engineer at MongoDB
[A Jesse Jiryu Davis]: https://twitter.com/jessejiryudavis
Links: screencast, resources
MongoS
Smart retry strategies
SDAM: Service Discovery and Monitoring
Bad retry strategies as they apply to:

Network blips
Primary failover
Network down
Command error

Make your operations idempotent.
Non-idempotent insert ($inc)

Add unique token.
Remove token and increment counter.
$addToSet
$pull, $inc

Eventual consistency after a network outage.
Appropriate for high value, infrequent updates when trade off for additional
load/latency is warranted.
Otherwise, just use $inc, and accept possible count misses.
Black Pipe Testing / MockupDB
bit.ly/resilient-applications
Ops Manager API, Puppet & OpenStack - Fully Automated Orchestration from Scratch!

Naama Bamberger, Software Developer at Cisco
MongoDB

Easier to perform updates
Forwards and backwards compatible

Automated deployment of MongoDB
Ops Manager?
OpenStack, HEAT, Puppet, Python for tooling
OpenStack is a virtualization solution. AWS is one of many crowd built on top of
OpenStack. Began in 2010 by Nasa and RackSpace.
OpenStack services: Compute/Networking/Storage
Orchestration service. Yaml confit describes deployment requirements.
Create VMs for each component.
OpenStack UI
OpsManager UI
Automation agent
Goal state = ready to use.
Multiple data centers
On secondaries:
Deploy agents
Create deployer
From deployer - Access Ops Manager in primary to extend cluster
Third data center with one arbiter.
When a fallback occurs, create a new machine with an arbiter so a new primary can be elected.
Sef / scale.io storage layer
Evergreen: The Life of a MongoDB GitHub Commit

[Kyle Erf][], Software Engineer, MongoDB
Shraya Ramani, Software Engineer, MongoDB
[Kyle Erf]: https://twitter.com/KyleErf
https://github.com/evergreen-ci/evergreen
Evergreen / In house continuous integration system
MongoDB is not a typical use case for CI
Most users have a good idea of how their product will be used.
Tests would take 20+ hours serially on one machine
Supported on multiple platforms.
MongoDB is tested on 50 different variants.
600 hours of computer time to run tests on all variants.
Used build bot before which couldn't scale enough.
Evergreen autoscales testing hardware to meet commit traffic.
Multi-platform support.
Powerful navigation. Open source licensing.
Components

Repo tracker: uses a polling strategy for recent commits.
Scheduler
Host initializer
Agent
Task runner

Goals

Minimize time in task queue
Minimize idle host time

"Job Shop Scheduling"
Minimize makespan
Evening Keynote

Oron Gill Haus, Managing Vice President, Consumer Bank Engineering, Retail and
Direct Bank, Capital One
Hygieia
CapitalOne DevOps Dashboard
https://github.com/capitalone/Hygieia
Wednesday, June 29th

Morning Keynote

[Dr. Eric Brewer][], Vice President of Infrastructure, Google
[Dr. Eric Brewer]: https://twitter.com/eric_brewer
Kubernetes 1.3
Rolling upgrades with pod labels
Names are persistent and resolvable
Init hook / recover or initialize state
Staggered start
Dr. Hannah Fry, Lecturer in the Mathematics of Cities at the Centre for Advanced
Spatial Analysis, University College London (UCL)
https://twitter.com/FryRsquared
Bike share usage patterns, tweet language distribution, and cows in heat!

The Life of a Write in a Sharded Cluster & Config Servers as Replica Sets

Randolph Tan, Software Engineer at MongoDB
Config servers with replica sets. Election process.
Single source of truth. Single server maintains lock.
readConcern
readConcernMajority
readAfterOpTime
Topics on MongoDB docs:

Read concern
Replica set
Rollback
Sharding concepts

Managing Petabytes of Data at Baidu

Beibei Xiao, DevOps Engineer at Baidu
2D geospatial indexes
MongoDB service API
Single point of entry
Quota control
Authorization
Flow control
Split large database into smaller databases
Create index in turn
replication create index
Secondary first and primary last
Just care of oplog time
Brought up as new mode without replication. Run index in foreground. Add to
replica set. [make primary?]. Oplog is synced to others.
Balancer and Migration
Problems:

Balanced degrade system performance.
Disk space not released to system after migration.
Balance speed is not fast enough to when shard number increases.

Solutions:

Use hash instead of range shard key to avoid balancing.
PreSplit and move chunks
Limit balancer running time window
Baidu custom balancer script
Migrate databases between nodes.

In the Future:

Spinning disk / better write performance when not using SSDs
Balancer: lighter and faster
Data compression.
WT engine, document validation.

Coming in MongoDB 3.4:

Enable parallel chunk migration
Remove migration throttling by default for WiredTiger

MongoDB Rocks
MongoDB storage integration layer for the Rocks storage engine
https://github.com/mongodb-partners/mongo-rocks
Advanced MongoDB Aggregation Pipelines

Joe Drumgoole, Director of Developer Advocacy, EMEA at MongoDB
Aggregation grew up in 3.0
Cursors are returned from aggregate queries. $out to new collection.
Processing pipeline
Design to process large groups of documents in parallel
Is shard aware
Can create new data from old
Match -> Project -> Group -> Sort
Group: group by, execute accumulators, rename fields.
Geo queries must be first. $out must be the last stage.
Demoed with U.K. Driver and Vehicle Licensing Agency (DVLA) data set
Building WiredTiger

Keith Bostic, Senior Staff Engineer at MongoDB
Top to bottom level searching
Higher levels skip more values
Singly forward linked list
Atomic increment/decrement

Hazard pointers
Skiplist
Ticket locks

Open source implementations in WiredTiger
~200 lines of code in a btree
~20 lines of code in a skiplist
7x-10x performance bump
https://github.com/wiredtiger
From Story to Document: Modeling Common Business Problems with MongoDB

Nuri Halperin, Principal at Plus N Consulting
Same rules, completely different scale
"Mongorama"
Example: "Maker Space" says the boss / space, creativity, tools, community
So many tables / relationships. Done. Mic drop.
First rule of Nuri's thumb: Key interactions should drive document design
Why is my data scattered across tables? Must all objects be flat? Can we iterate quickly?
Add makership info directly on a person document.
Data that works together lives together.
Instead of application logic to require certifications, put that boolean within the document?
Maybe the application should define and enforce the rules?
Embed immutable data.
You don't get much value from referential schema design.
Should I embed?: Ownership? / Work Together / Bound Growth / Lifetime
Example: Library book checkout system. Card stays with book.
Will I need more and more library cards?
Does the rate of change of the embedded data match the parent document?
Have a separate ledger collection. Add some additional information for maker /
tool so we don't have to do a referential lookup to answer questions.
Key interactions:

Maker gets verified
Borrow and return tool

Aggregation frameworks. Who's the biggest users of tools? Audit report. Aging report.
Let the engine work for you.
Aggregate framework should be used for reports instead of calculating in memory.
You can keep a fixed length array of recent checkouts on the tool itself. Do it
either on write or have a background job.
slice:-3
Collections: makers, tools, toolLog
Easier to determine function with these 3 collections than many more flat tables
with mostly referential data.
Takeaways:

Key interactions drive schema design
Data that works together lives together.
Embed immutable data.
Let the engine work for you


The roles have changed. DBAs no longer set the schema for developers to
conform to. Responsibility on the developer side has increased.