feanil/gist:d76bc06b240e4db9084e

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    We need a way to setup a cluster for the first time.
We need a way to add/remove nodes from the cluster

Jenkins jobs to cause the primary to stepdown.
Job for getting the node status for all nodes.
Periodic Index Compaction would be good for performance
This way if devs add indices as part of the code(ensureIndex) we can still get perf improvements of a compact index
Need a good pattern for manging indices
Ideally they would live in the app repos but it would be nice to automate their creation.
Maybe part of the AMI build would be to create indexes
Maybe get an AMI for Mongo
Connect to the existing cluster when it comes up?
Catch up
Periodic DBA for a day
Look at slow queries and indices

See if the query pattern has changed
See if indices are being used


No easy way to know when you can remove an index.

Would have to re-run queries with explain and for last N months of queries and see which indices are being used.


We should have write concern of majority or we could lose data when the primary falls over.

Need to loadtest it if we need to turn this on

If journal file is on the same volume as the data, the sync time for the journal is 100ms, otherwise it's 30ms
Scaling reads over secondaries is not recommended because of the potential for cascade failures.

We should figure out limits so that we have enough head room in a cluster for N of them to go down without bringing down the rest of the cluster.
We should also figure out what N should be for us
Their recommendation is to use shards

That seems like a pain in the butt.
Lots of danger of screwing up your data.
Network Issues are more likely to
They suggest paying them for MMS to manage it.


Personal Notes


Need to dive into the docker community and try it out.

Seems like it may be useful for edX but need to figure out what
the cost and complexity is for all the nice promises it is making.


Need to read through at least the mongo and cassandra related posts on
Jepsen
Need to learn more about YCBS: yahoo perf benchmarking spec?

Day1

Keynote


Starting off with the Chief Marketing Officer


Introduces the CEO: Dev Ittycheria


Joined Recently


He wants to "disrupt" the database market...


Mongodb 3.0


Excited about storage engines


WiredTiger is new in 3.0


RocksDB - built by facebook as another storage engine for 3.0

Greate for write heavy use cases it looks like
http://blog.parse.com/announcements/mongodb-rocksdb-parse/
http://rocksdb.org/
No stable release of this yet.


Claims better performance than both Couch and Cassandra.

Independent 3rd party testinng
I can imagine getting very high write output with sharding but
need to understanding shard-management/recover better.  From the
class yesterday it seems like there are many cases where the
sharding can result in dataloss during failure.


Stories of Companies that use MongoDB

x.ai: personal assistant who schedules meetings for you.


Ben Golub: CEO of Docker

He likes to use the word disrupt too...
Sales pitch for docker, making a case for why it's useful in a micro service world.
Stories of companies that went from big monoliths to hundreds of microservices in months w/docker


Concurrency Control in Mongo 3.0

Kaloian Manassiev: Kernel Engineer at MongoDB
Metadata everything except for the documents.
Date -> the documents
Metadata talks to data via the storage engine api
Mongodb Concurrency Control by Version


MongoDB 2.0: The top lock(global lock) Lock the whole instance either for reads or writes - only one writer at a time.

Protects both the metadata and the data.


MongoDB 2.2-2.6: Lock per database. Instead of per instance. Better concurrency.

The top lock still exists also intents are introduced.
Top lock + intents to allow for database related calls, createdb, dop db, etc.
Intents Introduced: Thing related to locks.

Intents are an extra attribute of the top lock for determine whether or not the whole instances should be locked.
Intention to access one or more children of an item(top lock/instance)

Need to acquire lock further down in the hierarchy if you take an intent


Lock

Locks an item for particular access


Mongodb 3.0: Moved to a hierarchy of locks(multi granualarity locking)

Can lock a db, or a collection or just a document lock.
Implemented using Lock Manager Library

Added Improved Fairness

Previously writes could overload the system preventing reads.(Sounds like a problem we could have while doing imports.)


Here they introduced the Storage Engine API

MonogoDB is responsibly for concurrency for the dbs and collection
Storage Engine is responsible for locking/concurrency of the actual documents.

MMAPv1 - Single lock per collection no doc level locking
WiredTiger - Lock per document - uses MVCC


Fsync lock is only for MMAPv1 not in WiredTiger

What is the correct pattern for backups if we are using WiredTiger?


Advanced Administration, Monitoring and Backup


Jeffrey Berger: Lead DB Engineer at Sailthru


Sailthru -> consumes a bunch of data.


4 Clusters and 9 Stand-Alone RS


Larges is 32 Shard 5.5TB cluster with ~1.5 Bn Profiles


Micro Sharded

Multiple MongoD instances on a single piece of hardware

Meant to improve concurrency

More cores to do your operations


Not easy to monitor

Especially with MMAPv1


They Use Zabbix

github.com/sailthru/mongodb-zabbix
does a lot of what MMS does
Run custom scripts that use the same access pattern as their app

Do 100 findOnes


Do Query Logging


DB Map by SailThru, soon to be open sourced python tool for knowing your network topology.

Exportable inventory for Ansible


Backups

Cloud Backups

Hidden Secondary in a different AZ
Take Disk Snapshots


They moved to MMS Backups

The fact that we can do mongodump and restore, means we probably don't have that much data.


Now that we're reading from the secondary we should make sure the backups don't impact
the reads from the secondary. *


MMS has some secret sauce for doing backups for a RS with Shards

This is otherwise quite hard.


Data Migration from One Datacenter to Another

They found that none of the options were fast enough

Mongodump meant going to disk which was very slow.


Wrote a thing to do this:

Mongopipe : Not yet opensourced
Custom multiprocessing python process to insert without hitting disk.


Also used mongoconnector to mirror mongodb operations to other nodes.

Listens to Oplog -> Doc Manager(Backend for the datastore you want to migrate to) -> Whetever Datastore you want including MongoDB, elasticsearch, etc.


Mongoexup : tool for doing mongo export to s3

To be open sourced soon.


MongoDB Drivers and High Availibity: Deep Dive

A. Jesse Jiryu Davis: On the driver team at MongoDB
Service Discovery and Monitoring Spec now written down
All drivers are now converging to this spec
Changes to the driver with pymongo 3.0

Everything uses the mongo client.
MongoClient will discover nodes in your cluster.
MongoClient call is non-blocking
It will block the first time you try to use it
if it doesn't know about the primary.


How the Driver Works

Pass it some hosts
This is the initial topology.
It contacts all hosts it knows about to get their status
The topology is updated from the responses.
All calls that try to use the connection block until the
driver knows about who the primary is.
There is a thread per replica-set member that checks its heartbeat
In Steady State

The threads check their assigned hosts every 10 seconds.
Latency is checked to adjust where read requests go to.


Crisis

Primary Goes Down
Driver doesn't find out immediately.
ConnectionFailure exception gets thrown from driver to your client.
On Next Insert

All threads check their hosts to find a primary


bit.ly/server-discovery


Elections are even faster in MongoDB 3.0

Benchmarking, Load Testing, and Preventing Terrible Disasters


Michael Kania: Parse Production Engineer


Generate Indexes on the fly in MongoDB


Delete indexes that aren't being used.


Cowboy Upgrade

Run Tests Against the new version
Spin up hidden secondary and Watch
Un-hide secondary(now people are reading from it) and Watch
Promote


New Approach

Do it with production workload in a test environment
Ideally shadow traffic
Things you may want to know

Be able to find regressions.


Flashback: Opensource tool they wrote

Capture production workload
replay it on your test environment.

At a configurable speed.


How to Record:

Oplog Server: save the writes
Profiler Server: save the read operations

Have to turn on profiling for all queries
so that we can record all the queries


Duration of recording


Snapshot a starting point:

Do an EBS Snapshot
FYI: EBS Snapshot lazy loads from S3 so there is a warming period.


Flashback gives you response times from p50 to p99


They couldn't use WiredTiger because each index and collection is represented
as a file and they had lots of collections(10s of millions) and indexes.

This means lots of open file descriptors.
This is apparently not a problem with RocksDB

This means lots of open file descriptors


Other Tools

mongologtools


High Performance with MongoDB 3.0

Asya Kamsky
Latency: Time to execute one of something, how long "it" takes
Throughput: The amount of semthing per unit of time, how many "per unit of time"
If one of these goes down, it will affect the other(in both directions)
However, if you increase throughput will not necessarily increase latency

Latency is affected by the network, and trael time between the app and the user
We can horizontally scale to increase throughput but this does not help with latency.


Physical Resources need to be planned

ram
network
cpu
disk


Tuning the digital resources

App
Driver
Schema Index
Storage Engine
Filesystem
OS


This talk concentrates on the schema/indexes and storage engine

Schema/Indexes

Structure our data in the way the application needs to use them.
Patterns

Data Locality: Put data that goes together, in one document if you can.

Opposite is also good, don't put data together that doesn't need to go together.


Anti-Pattern

Over-Normalization:  Looks like a RDBMS schema, lots of collections that are related to each other

Signs of over normalizing

You're doing lots of joins in your application


Over-Ebedding: One document to rule them all.

Signs of over embedding

Unbounded Growth
Deeply Nested Arrays


Other Signs of Trouble

reads vs writes

Figure out which thing your are building for. This is a trade-off


Polymorphic collection

The collection has too many different types
One collection of everything
No schema that defines all of them.


Polymorphic fields

Fields that could be lots of different types.


Can't use indexes

If your collection can't use indexes


Too many indexes

Probably points to it being a polymorphic collection


No Indexes

Means you haven't thought about how your data is going to
be used.


Storage Engines

Pro/con of MMAPv1 and WT(Wired Tiger)
Compression

WT: On Disk Compression(10x of MMAPv1)

Needs to un-compress on read and copress on write

Trades disk iops for CPU

Less Disk IOPS but need more CPU


Indexes also take up less space in memory.


MMAP: No Compression


Concurrency

WT: Document level concurrency(not document level locking)

Not actual locks but collision detection
This is why it's slower, the writes have keep track of
more things.
The more contention there is for a doc, the higher the latency.
Fastest When:

There are multiple threads
Multiple writes to the same collection
But not to the same document
Better CPU
Not too many workers(# Workers will need to match 2x number of cores)


MMAP: Only Collection level concurrency

Latency of write is much faster here because it
dosen't need to do the collision detection.


Write Pattern

WT: Copy on Write

Threads that are reading the old version, continue and
the doc is marked for deletion once they are done.
More work for bigger documents.


MMAP:  In place update if possible.


Storage Wars

LSM has a read penalty
LSM writes are really fast at the cost of reads
BTree takes up more space but is always sorted making writes fast.
WiredTiger - has both BTree and LSM Storage

Not available in 3.0 but will be available in 3.2


LMS DBs: Tombstone Trap

In LSM you don't delete you just mark for deletion(tombstone record).
Cassandra is an LSM database.
You can get a lot of tombstore records in the LSM database and you need
a strategy for monitoring and deleting them.


Day 2

Keynote

New Features in 3.2

MongoDB Scoute

A tool to let you discover the schema of a database.
A web app that gives youn info about what's in a collection.


Partial Indexes

The ability to index only some of the documents based on a filter.


Document Validation

collMod(collection modify) add validator for the values of fields.
You can add validators that will affect new inserts but not the existing documents.


Mongodb Querying

New aggregation operators to make reports easier.
New operator: $lookup

A way to do aggregation across collections(ie. joins across collection)


New Aggregation operator is meant to make it easier to do reports


BI Connector

Partenering with Tableau
Tableau will integrate with a mongodb and do aggregations and queries on the fly
to visualize mongo data.


Encryption at Rest


Go Pitfalls and Positives

Have multiple return values that are usually returned inline.
Variable Shadowing

':=' implicitly declares a variable within the current scope.
'=' assignment operator


gofmt

The enforced style guide of the golang.


Ops Manager Advanced Administration

LDAP and User Roles

Useful for multiple deployments.
Setup the dns for the search.
Then you have to map the LDAP groups to mongo roles.


Alerts

MMS Can monitor backups as well
Monitor the backup daemons
Webhook Alerts are available

POST to the URL we specify
Has a header that tells you whether the alert is open or closed
Also has a signature header that is generted from a shared secret.

Allows you to validate the authenticity of the request.


Can get SMS messages with Twilio


Ops Manager has a concept of Data Centers(DC)

Setup a Backup Agent per DC
Introduces the idea of Group Pinning

Multiple Ops Manager Instances but using the same app DB


Orchestration engine for Mongo

A tool for testing
lets you create cluster configuration via a rest api
It's a python server with a rest api that you can use to affect the clusters you bring up

X.ai

Mongoose

A node app to put an API on top of Mongo
All their data modifications go through the Mongoose api
Moongoose also provides validation


Tools For Debugging Mongo

mongostat
mongodump
db.currentOp()
db.serverStatus()
MMS
Mongo Log File
MTools: scripts to manage, manipulate and analyze mongo log files

mloginfo: tells you about the log file type. identifies the logfile(version, datastore, host, etc)

Analyze patterns of queries in files


mplotqueries: visualize the log files


Mongo will ocassionally yield locks from long running queries.  Seems dangerous w/MMAP as the documents would change.