Skip to content

Instantly share code, notes, and snippets.

@feanil
Last active January 9, 2016 00:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save feanil/d76bc06b240e4db9084e to your computer and use it in GitHub Desktop.
Save feanil/d76bc06b240e4db9084e to your computer and use it in GitHub Desktop.
MongoDB World

We need a way to setup a cluster for the first time. We need a way to add/remove nodes from the cluster

  • Jenkins jobs to cause the primary to stepdown.
  • Job for getting the node status for all nodes. Periodic Index Compaction would be good for performance
  • This way if devs add indices as part of the code(ensureIndex) we can still get perf improvements of a compact index Need a good pattern for manging indices
  • Ideally they would live in the app repos but it would be nice to automate their creation.
  • Maybe part of the AMI build would be to create indexes Maybe get an AMI for Mongo
  • Connect to the existing cluster when it comes up?
  • Catch up Periodic DBA for a day
  • Look at slow queries and indices
    • See if the query pattern has changed
    • See if indices are being used
  • No easy way to know when you can remove an index.
    • Would have to re-run queries with explain and for last N months of queries and see which indices are being used.

We should have write concern of majority or we could lose data when the primary falls over.

  • Need to loadtest it if we need to turn this on

If journal file is on the same volume as the data, the sync time for the journal is 100ms, otherwise it's 30ms

Scaling reads over secondaries is not recommended because of the potential for cascade failures.

  • We should figure out limits so that we have enough head room in a cluster for N of them to go down without bringing down the rest of the cluster.
  • We should also figure out what N should be for us
  • Their recommendation is to use shards
    • That seems like a pain in the butt.
    • Lots of danger of screwing up your data.
    • Network Issues are more likely to
    • They suggest paying them for MMS to manage it.

Personal Notes

  • Need to dive into the docker community and try it out.
    • Seems like it may be useful for edX but need to figure out what the cost and complexity is for all the nice promises it is making.
  • Need to read through at least the mongo and cassandra related posts on Jepsen
  • Need to learn more about YCBS: yahoo perf benchmarking spec?

Day1

Keynote

  • Starting off with the Chief Marketing Officer

  • Introduces the CEO: Dev Ittycheria

    • Joined Recently

    • He wants to "disrupt" the database market...

    • Mongodb 3.0

      • Excited about storage engines

      • WiredTiger is new in 3.0

      • RocksDB - built by facebook as another storage engine for 3.0

      • Claims better performance than both Couch and Cassandra.

        • Independent 3rd party testinng
        • I can imagine getting very high write output with sharding but need to understanding shard-management/recover better. From the class yesterday it seems like there are many cases where the sharding can result in dataloss during failure.
      • Stories of Companies that use MongoDB

        • x.ai: personal assistant who schedules meetings for you.
  • Ben Golub: CEO of Docker

    • He likes to use the word disrupt too...
    • Sales pitch for docker, making a case for why it's useful in a micro service world.
    • Stories of companies that went from big monoliths to hundreds of microservices in months w/docker

Concurrency Control in Mongo 3.0

  • Kaloian Manassiev: Kernel Engineer at MongoDB
  • Metadata everything except for the documents.
  • Date -> the documents
  • Metadata talks to data via the storage engine api
  • Mongodb Concurrency Control by Version
    • MongoDB 2.0: The top lock(global lock) Lock the whole instance either for reads or writes - only one writer at a time.

      • Protects both the metadata and the data.
    • MongoDB 2.2-2.6: Lock per database. Instead of per instance. Better concurrency.

      • The top lock still exists also intents are introduced.
      • Top lock + intents to allow for database related calls, createdb, dop db, etc.
      • Intents Introduced: Thing related to locks.
        • Intents are an extra attribute of the top lock for determine whether or not the whole instances should be locked.
        • Intention to access one or more children of an item(top lock/instance)
          • Need to acquire lock further down in the hierarchy if you take an intent
      • Lock
        • Locks an item for particular access
    • Mongodb 3.0: Moved to a hierarchy of locks(multi granualarity locking)

      • Can lock a db, or a collection or just a document lock.
      • Implemented using Lock Manager Library
        • Added Improved Fairness
          • Previously writes could overload the system preventing reads.(Sounds like a problem we could have while doing imports.)
      • Here they introduced the Storage Engine API
        • MonogoDB is responsibly for concurrency for the dbs and collection
        • Storage Engine is responsible for locking/concurrency of the actual documents.
          • MMAPv1 - Single lock per collection no doc level locking
          • WiredTiger - Lock per document - uses MVCC
  • Fsync lock is only for MMAPv1 not in WiredTiger
    • What is the correct pattern for backups if we are using WiredTiger?

Advanced Administration, Monitoring and Backup

  • Jeffrey Berger: Lead DB Engineer at Sailthru

  • Sailthru -> consumes a bunch of data.

  • 4 Clusters and 9 Stand-Alone RS

  • Larges is 32 Shard 5.5TB cluster with ~1.5 Bn Profiles

  • Micro Sharded

    • Multiple MongoD instances on a single piece of hardware
      • Meant to improve concurrency
        • More cores to do your operations
      • Not easy to monitor
        • Especially with MMAPv1
  • They Use Zabbix

    • github.com/sailthru/mongodb-zabbix
    • does a lot of what MMS does
    • Run custom scripts that use the same access pattern as their app
      • Do 100 findOnes
  • Do Query Logging

  • DB Map by SailThru, soon to be open sourced python tool for knowing your network topology.

    • Exportable inventory for Ansible
  • Backups

    • Cloud Backups
      • Hidden Secondary in a different AZ
      • Take Disk Snapshots
    • They moved to MMS Backups
      • The fact that we can do mongodump and restore, means we probably don't have that much data.
          • Now that we're reading from the secondary we should make sure the backups don't impact the reads from the secondary. *
      • MMS has some secret sauce for doing backups for a RS with Shards
        • This is otherwise quite hard.
  • Data Migration from One Datacenter to Another

    • They found that none of the options were fast enough
      • Mongodump meant going to disk which was very slow.
    • Wrote a thing to do this:
      • Mongopipe : Not yet opensourced
      • Custom multiprocessing python process to insert without hitting disk.
    • Also used mongoconnector to mirror mongodb operations to other nodes.
      • Listens to Oplog -> Doc Manager(Backend for the datastore you want to migrate to) -> Whetever Datastore you want including MongoDB, elasticsearch, etc.
  • Mongoexup : tool for doing mongo export to s3

    • To be open sourced soon.

MongoDB Drivers and High Availibity: Deep Dive

  • A. Jesse Jiryu Davis: On the driver team at MongoDB
  • Service Discovery and Monitoring Spec now written down
  • All drivers are now converging to this spec
  • Changes to the driver with pymongo 3.0
    • Everything uses the mongo client.
    • MongoClient will discover nodes in your cluster.
    • MongoClient call is non-blocking
    • It will block the first time you try to use it if it doesn't know about the primary.
  • How the Driver Works
    • Pass it some hosts
    • This is the initial topology.
    • It contacts all hosts it knows about to get their status
    • The topology is updated from the responses.
    • All calls that try to use the connection block until the driver knows about who the primary is.
    • There is a thread per replica-set member that checks its heartbeat
    • In Steady State
      • The threads check their assigned hosts every 10 seconds.
      • Latency is checked to adjust where read requests go to.
    • Crisis
      • Primary Goes Down
      • Driver doesn't find out immediately.
      • ConnectionFailure exception gets thrown from driver to your client.
      • On Next Insert
        • All threads check their hosts to find a primary
      • bit.ly/server-discovery
  • Elections are even faster in MongoDB 3.0

Benchmarking, Load Testing, and Preventing Terrible Disasters

  • Michael Kania: Parse Production Engineer

  • Generate Indexes on the fly in MongoDB

  • Delete indexes that aren't being used.

  • Cowboy Upgrade

    • Run Tests Against the new version
    • Spin up hidden secondary and Watch
    • Un-hide secondary(now people are reading from it) and Watch
    • Promote
  • New Approach

    • Do it with production workload in a test environment
    • Ideally shadow traffic
    • Things you may want to know
      • Be able to find regressions.
    • Flashback: Opensource tool they wrote
      • Capture production workload
      • replay it on your test environment.
        • At a configurable speed.
      • How to Record:
        • Oplog Server: save the writes
        • Profiler Server: save the read operations
          • Have to turn on profiling for all queries so that we can record all the queries
        • Duration of recording
      • Snapshot a starting point:
        • Do an EBS Snapshot
        • FYI: EBS Snapshot lazy loads from S3 so there is a warming period.
      • Flashback gives you response times from p50 to p99
    • They couldn't use WiredTiger because each index and collection is represented as a file and they had lots of collections(10s of millions) and indexes.
      • This means lots of open file descriptors.
      • This is apparently not a problem with RocksDB
        • This means lots of open file descriptors
  • Other Tools

    • mongologtools

High Performance with MongoDB 3.0

  • Asya Kamsky
  • Latency: Time to execute one of something, how long "it" takes
  • Throughput: The amount of semthing per unit of time, how many "per unit of time"
  • If one of these goes down, it will affect the other(in both directions)
  • However, if you increase throughput will not necessarily increase latency
    • Latency is affected by the network, and trael time between the app and the user
    • We can horizontally scale to increase throughput but this does not help with latency.
  • Physical Resources need to be planned
    • ram
    • network
    • cpu
    • disk
  • Tuning the digital resources
    • App
    • Driver
    • Schema Index
    • Storage Engine
    • Filesystem
    • OS
  • This talk concentrates on the schema/indexes and storage engine
    • Schema/Indexes
      • Structure our data in the way the application needs to use them.
      • Patterns
        • Data Locality: Put data that goes together, in one document if you can.
          • Opposite is also good, don't put data together that doesn't need to go together.
      • Anti-Pattern
        • Over-Normalization: Looks like a RDBMS schema, lots of collections that are related to each other
          • Signs of over normalizing
            • You're doing lots of joins in your application
        • Over-Ebedding: One document to rule them all.
          • Signs of over embedding
            • Unbounded Growth
            • Deeply Nested Arrays
        • Other Signs of Trouble
          • reads vs writes
            • Figure out which thing your are building for. This is a trade-off
          • Polymorphic collection
            • The collection has too many different types
            • One collection of everything
            • No schema that defines all of them.
          • Polymorphic fields
            • Fields that could be lots of different types.
          • Can't use indexes
            • If your collection can't use indexes
          • Too many indexes
            • Probably points to it being a polymorphic collection
          • No Indexes
            • Means you haven't thought about how your data is going to be used.
    • Storage Engines
      • Pro/con of MMAPv1 and WT(Wired Tiger)
      • Compression
        • WT: On Disk Compression(10x of MMAPv1)
          • Needs to un-compress on read and copress on write
            • Trades disk iops for CPU
              • Less Disk IOPS but need more CPU
          • Indexes also take up less space in memory.
        • MMAP: No Compression
      • Concurrency
        • WT: Document level concurrency(not document level locking)
          • Not actual locks but collision detection
          • This is why it's slower, the writes have keep track of more things.
          • The more contention there is for a doc, the higher the latency.
          • Fastest When:
            • There are multiple threads
            • Multiple writes to the same collection
            • But not to the same document
            • Better CPU
            • Not too many workers(# Workers will need to match 2x number of cores)
        • MMAP: Only Collection level concurrency
          • Latency of write is much faster here because it dosen't need to do the collision detection.
      • Write Pattern
        • WT: Copy on Write
          • Threads that are reading the old version, continue and the doc is marked for deletion once they are done.
          • More work for bigger documents.
        • MMAP: In place update if possible.

Storage Wars

  • LSM has a read penalty
  • LSM writes are really fast at the cost of reads
  • BTree takes up more space but is always sorted making writes fast.
  • WiredTiger - has both BTree and LSM Storage
    • Not available in 3.0 but will be available in 3.2
  • LMS DBs: Tombstone Trap
    • In LSM you don't delete you just mark for deletion(tombstone record).
    • Cassandra is an LSM database.
    • You can get a lot of tombstore records in the LSM database and you need a strategy for monitoring and deleting them.

Day 2

Keynote

  • New Features in 3.2
    • MongoDB Scoute
      • A tool to let you discover the schema of a database.
      • A web app that gives youn info about what's in a collection.
    • Partial Indexes
      • The ability to index only some of the documents based on a filter.
    • Document Validation
      • collMod(collection modify) add validator for the values of fields.
      • You can add validators that will affect new inserts but not the existing documents.
    • Mongodb Querying
      • New aggregation operators to make reports easier.
      • New operator: $lookup
        • A way to do aggregation across collections(ie. joins across collection)
      • New Aggregation operator is meant to make it easier to do reports
    • BI Connector
      • Partenering with Tableau
      • Tableau will integrate with a mongodb and do aggregations and queries on the fly to visualize mongo data.
    • Encryption at Rest

Go Pitfalls and Positives

  • Have multiple return values that are usually returned inline.
  • Variable Shadowing
    • ':=' implicitly declares a variable within the current scope.
    • '=' assignment operator
  • gofmt
    • The enforced style guide of the golang.

Ops Manager Advanced Administration

  • LDAP and User Roles
    • Useful for multiple deployments.
    • Setup the dns for the search.
    • Then you have to map the LDAP groups to mongo roles.
  • Alerts
    • MMS Can monitor backups as well
    • Monitor the backup daemons
    • Webhook Alerts are available
      • POST to the URL we specify
      • Has a header that tells you whether the alert is open or closed
      • Also has a signature header that is generted from a shared secret.
        • Allows you to validate the authenticity of the request.
    • Can get SMS messages with Twilio
  • Ops Manager has a concept of Data Centers(DC)
    • Setup a Backup Agent per DC
    • Introduces the idea of Group Pinning
      • Multiple Ops Manager Instances but using the same app DB

Orchestration engine for Mongo

  • A tool for testing
  • lets you create cluster configuration via a rest api
  • It's a python server with a rest api that you can use to affect the clusters you bring up

X.ai

  • Mongoose
    • A node app to put an API on top of Mongo
    • All their data modifications go through the Mongoose api
    • Moongoose also provides validation

Tools For Debugging Mongo

  • mongostat
  • mongodump
  • db.currentOp()
  • db.serverStatus()
  • MMS
  • Mongo Log File
  • MTools: scripts to manage, manipulate and analyze mongo log files
    • mloginfo: tells you about the log file type. identifies the logfile(version, datastore, host, etc)
      • Analyze patterns of queries in files
    • mplotqueries: visualize the log files

Mongo will ocassionally yield locks from long running queries. Seems dangerous w/MMAP as the documents would change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment