Skip to content

Instantly share code, notes, and snippets.

@danishkhan
Created May 25, 2011 00:14
Show Gist options
  • Save danishkhan/990043 to your computer and use it in GitHub Desktop.
Save danishkhan/990043 to your computer and use it in GitHub Desktop.
Mongo Notes from MongoSF 20011

MongoDB Notes

Building Web Apps with MongoDB

MongoDB Schema Design

  • How to start thinking in terms of rich document modeling
  • mongo makes you feel like you are denormalizing your data, it makes your data feel more object like
  • object like is a huge gain of mongo
  • collections is a set of documentions equivalent to a table
  • NO joins in mongo, but there is embedding
  • sophisticated query system, not as good as SQL, but pretty decent
  • all updates are atomic and isolated
  • Considerations
    • no joins
    • documents are atomic
  • mongo id is a bson specific id that is given to you
  • you get an automatic timestamp as well
  • You can examine the query plan by using .explain()
  • cool update operators such as puss, pull, pop, etc..
  • The 'dot' operator
  • Modify atomically
    • findAndModify allows you to find and modify atomically
  • have the db conform to the application you are trying to build

MongoDB Performance Tuning by Shutterfly

  • Shutterfly doesn't have any cloud based stuff they run on their own private servers
  • traditional to RDBMS environments
  • Data modeling matters, kind of where you start tuning
  • General tuning order
  • Statement tuning
    • enable it, leave it on. it is a low overhead
    • What to look for?
      • full scans
        • nreturned vs nscanned
      • updates
        • fastmod (fastest)
        • moved (exceeds reserved space in document)
        • key updates (indexes need update)
    • explain()
      • use during development
      • use when you find bad operations in profiler
      • db.foo.find().explain()
        • index usage; nscanned vs nreturned
        • nYeilds = waiting for an operation to be completed
        • covered indexes says you can get all data by just reading the index no reason to go to the payload
        • run twice for in memory speed
  • High performance writes
    • Tuning
      • read before write
      • profiler
        • tune for fastmod
      • architectural changes
        • split by collection
        • shard
  • High performance reads
    • cache to disk ratio
      • try to have enough memory in system for your indexes
      • mongostat faults column
    • data locality
      • organize data for optimized I/O path. Minimize I/O per query
  • Tools
    • mongostat
      • aggregate instance level information
        • faults: cache misses
        • lock%: tune updates
    • mtop
      • good picture of current session level information
    • iostat
      • how much physical I/O you are doing?
  • is it faster to use a single thread for writes?
    • yes

MongoDB Shell hacks

  • shell is spidermonkey
  • what is it good for?
    • debugging
    • administration
    • scripting glue
    • NOT for building apps

Migrating from MySQL to MongoDB by Craigslist

Performance indicators of MongoDB

  • mongostat
    • like iostat
    • gives you your virtual size
    • provided by a database command called serverStatus
      • db.serverStatus();
    • profiler
      • db.setProfilingLevel(2)
        • 2 = any operations (insert, read, write) that takes longer than a certain amount of miliseconds the default is 100
    • principals for indexing
      • same as RDBMS
  • Monitoring service
    • Nagios and Munin as well as MMS (Mongo Monitoring service)
  • Write block percentage
    • Concurrency
      • one write OR many readers
  • web-console
    • always have at port 28017 an http page for console info
  • background flushing
    • 10gen tells people to RAID their EBS volumes
  • connection leaks are sometimes an issue
  • Network bytes in and out
    • important for read heavy applications
  • Fragmentation
    • padding factor
      • you cannot manually set padding factor right now
      • dynamically calculated, the amount of space to leave when you update a new document
  • Journaling
    • recommend having a second spindle just for the journal because syncing to the journal is a little expensive
  • you can create a secondary index in the background
    • can take a secondary index offline and then sync it back up

MongoDB @ foursquare

  • nginx, Haproxy
  • mongodb and migrating off of postgres
  • what we love about mongodb
  • lessons learned
    • keep working set in memory
      • keep indexes in memory
    • avoid long-running queries
    • monitor everything (per collection stats)
      • application level metrics is always good to monitor
    • use small field names for large collections

MongoDB in Ruby

  • mongo gem and bson gem because bson is the native object
  • bsonext gem make it a bit faster
  • all ruby types map to bson types
  • object ids are NOT strings
  • MongoMapper recommended over MongoID. There is also Mongomatic

MongoDB in the Cloud

  • You need to size your replica set as if it were the primary
  • Typical MongoD should be on a large or extra large standard on demand instance on EC2
  • Big MongoD should be on extra large, double extra large, quadruple extra large high-memory on-demand instance on EC2
  • Small instance on EC2 is 32-bit so DO NOT use it
  • ConfigD/Arbiter can run on a micro instance on EC2
  • High-CPU Medium is 32-bit so DO NOT use it on EC2. High-CPU in general is just not necessary. More RAM is more important than having more CPU
  • Operating Systems (Debian, Ubunti, Fedora, Redhat, FreeBSD)
    • Turn off atime
    • Raise file descriptor limits
      • cat >> /etc/security/limits.conf << EOF
        • hard nofile 65536
        • soft nofile 65536 EOF
    • Use ext4, xfs
    • DO NOT use large VM pages
    • Use RAID
      • RAID10 on MongoD
      • RAID1 on ConfigDB
  • MongoD on EC2
    • LVM or MDADM
    • 64-Bit EC2 instance
    • stripping = partitions of mirrors
  • MongoS on EC2
    • Runs on Application server
    • doesn't need disk, ebs volume, raid
    • 32 or 64 bit instance
  • Arbiter on EC2
    • Meant to vote on elections
    • Normally need once a week
    • Do not run it on the same node as MongoD
    • 64 bit EC2 instance, micro or small is fine
  • ConfigDB on EC2
    • LVM or MDADM
    • 64 bit EC2 instance micro or small is fine
  • Deployment scenarios
    • 3 - Node replica set
      • 2 large MongoD in US-East one is primary and one is secondary with RAID 10
      • 1 secondary MongoD with priority = 0 (cannot become a primary) in US-West also with RAID 10
  • why to find out which is the master
    • db.is_master?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment