Skip to content

Instantly share code, notes, and snippets.

@mikewadhera
Created May 1, 2010 19:49
Show Gist options
  • Save mikewadhera/386607 to your computer and use it in GitHub Desktop.
Save mikewadhera/386607 to your computer and use it in GitHub Desktop.
my notes from mongosf
I. Map/Reduce, GeoSpatial Indexing & other cool features
* Client features
Safe Inserts - Batches a subsequent write and read together to ensure order preservation when using connection pooling in a multi-threaded client
Replication ACK - Can configure minimum acknowledgment level of slave writes. Client will raise exception if replication level not met during a write
GetMore operation - receive data from server in chunks
http://api.mongodb.org/js - reference for JS REPL
db.runCommand( (listCommands : 1) ) - lists all possible commands
* Lightweight query language, $ prefixed operators
$slice -- similar to LIMIT
- supports (offset, limit) & negative indices
$gt, $gte
$elemMatch
* Other features
MapReduce - allows for arbitrary, ad-hoc queries. Write 2 functions in JS - mapper function emits key/values for each doc in collection, reducer linearly processes emitted key/values.
GridFS - splits up binary files into chunked documents -- nothing "magical", implemented on top of vanilla client operations
Indexes - create with ensureIndex(field, asc/desc) Dates, timestamps and other increasing values are good candidates for indexes as older values are swapped to disk.
Wow: Fields that have arrays are exploded and each member is indexed separately to allow for efficient searching
GeoSpatial indexes - create index with ensureIndex(<field> : '2d')
Search with $near operator
II. Schema Design
Document-oriented storage
Not relational
Not OODB
JSON data
Row -> Document
Table -> Collection
Indexes -> Index
Join -> Embedding / Linking
* Embedding vs Linking
- "Contains relationship" : embed
- Embed = "pre joined"
- Links: transparent client/server turnarounds
- When in doubt embed - Note 4MB object size limit
Partial updates are supported
- $set order.total = x
- $push order.line_items(y)
- Also O(1) partial updates with references are supported - i.e. update all line_items[0].price to line_items[1].price
* Data Structures
Trees
3 possible designs:
- Embed whole tree in 1 document
- Link to parent (index by parent id)
- Store array of ancestors along with immediate parent (index by ancestors & parent id)
OO inheritance
STI works well
Dynamic schema doesn't incur "sparse schema" penalty when a collection is highly heterogenous
Indexing also works great as null's are not indexed
* Atomicity
2 ways to get document-level atomicity:
- $ operators -- atomic in-place changes, though not turing complete
- CAS "compare and swap" (optimistic concurrency)
usually implemented in your driver's save() function
essentially: while (savedId != savedDoc.id) change(doc); savedId = save(doc); savedDoc = fetch(doc)
* Capped collections
- "special" collections with *very* fast read/write
- at a high-level: preallocated append-only circular buffer
- useful for activity streams
- no delete (use delete flag and filter at app level)
- can only perform update if the object doesn't change size
* Indexing caveats
- ensureIndex() blocks reads/writes
- Background indexing feature - runs on low pri thread in background, reads are still blocked
III. MySQL to MongoDB
Wordnik - project to track languages (gps for english)
- language is not static
- capture information about all words (meaning/relations)
- lots of data in corpus
* MongoDB migration
Started seeing lockups on MyISAM tables when inserting at 4B rows (wondered if he tried InnoDB edit: someone asked this. answer: InnoDB performance at 4B rows is resistant to optimization)
Researched noSQL solutions, chose MongoDB
* Migration:
> 5 B documents
> 1.5 TB
0 downtime
* Findings:
- MongoDB loves system resources
- run on dedicated physical hardware
- 8 core, 32gb ram, FC SAN
- Very bad performance with virtualization, switched to physical fibre channel enabled I/O, YMMV
- many X the disk space of MySQL (easy pill to swallow)
Two distinct use cases for storage:
- key/value storage - easy to migrate
- hierarchical - required more work
* Migration path:
- Using JDBC
- Implemented all CRUD methods in DAO
- Allowed swapping between Mongo and mysql at runtime
- Used auto ID generation in Mongo as backwards compatible with mysql
- Objects serialized to JSON using JacksonMapper
- De-normalized 12+ mysql tables to 2 collections
- Used embedded model to persist relations - no joins over 20x faster
- Mysql had 50 key/value objects per second read throughput
- MongoDB had 250k key/value objects per second read throughput
(hmm.. makes me wonder how much of the performance boost was due to de-normalizing data model and how much was switching data stores)
Question: how do you deal with de-normalization and data changes?
Answer: in practice not a big deal as dictionary data doesn't change that often
* Bulk insert migration:
- MongoDB has fire-and-forget inserts
- Wrote migration script to do bulk inserts into MongoDB
- Shut off online writes for 24 hours
- Sustained 100k inserts/sec during migration
* Caching:
- Do not skimp on your disk or RAM
- MongoDB uses LRU cache with mmap() -- no longer needed to run memcached/out-of-process cache
IV. Event logging and funnel analysis
Justin.tv
Questions:
Who does what (funnels)
How valuable are groups of users (virality)
Are our changes working? (retention, funnel conversion)
(Chris thinks this is a rather sophomoric implementation and says our stuff is much further along)
V. Administration
* Backups
- mongodump - similar to mysqldump, dumps BSON documents to a file
- fsync + lock - faster, puts DB into read-only then takes FS snapshot, usually takes a few seconds
* Replication
- use replication!
- delayed option - good safe guard against accidental operations run on master
- oplog size is important - transaction log is stored in capped collection - default size is 5% of free space
- shell command on a slave shows you how far behind it is from master
- see more in replication talk
* Monitoring
mongostat - similar to iostat <--- awesome
web console
- "top" for collections
- shows total locked time
- shows all running operations
3rd party plugins
- munin (written by 10gen)
- ganglia
- nagios
- cacti
Disk layout
- MongoDB pre-generates large files (extents) to avoid OS block buffering idiosyncrasies
Connections
- Default max: 20,000
Log message levels:
From critical/bad to ok:
- Asserts
- Warnings
- Message Asserts
- User Asserts (your fault)
VI. Replication
Similar to mysql - async M/S
In practice, near realtime
Generally works fine over WAN (cross datacenter capable)
1 master can handle about 10 slaves
start master with --master --oplog <sizeinmb>
start slave with --slave --source <masterhost:masterport>
Oplog
- Similar to transaction log in mysql
- Circular buffer (actually a capped collection)
- Slave receives + replays oplog from master
- Slave will catch up with master on startup then go into op replay
- Op replay is idempotent
Replication Info
- db.printReplicationInfo() shows replication state on connected master
- db.printSlaveReplicationInfo() shows replication state on connected slave
Replica Pairs
- Stable in next major version, 1.6 - now called Replica Sets
- awesome
- Single Master, N Slave with automatic failover & recovery
- Consensus election of new master when master becomes partitioned
- Lots of control over consensus - vote weighting, member types e.g. ability to have dummy arbiter for tie-breaking
- Rack and datacenter aware
- Driver will be notified of master change on next read/write against new topology (implemented as error + retry)
- Write is "durable" when committed to majority of servers in the set
- Caveat: read on master is immediately visible, true 100% read consistency is only on master. In practice this is usually not an issue if using durable writes
VII. Sharding
What? Vertical scaling
Why? Can scale wider than higher
* MongoDB supports automatic sharding
- Range based
- Zero downtime deployment/rollout
- Full Consistency
- Actually a framework
- Completely orthogonal to replication
* Anatomy
Config servers:
- at least 3 instances needed (changes use 2 phase commit)
- service will turn read only if any instances are down
- system is online so as long as 1/3 of the set is up
Shards:
- can be master, master/slave or replica sets (!)
- sharding + replica sets = automatic vertical partitioning with automatic failover (!!)
- plain old mongod processes
Sharding Router (mongos):
- Transparently handles all sharding routing logic
- Stick between client and mongod -- looks just like a mongod to clients
- Can have 1 or as many as you like
- Can run directly on application server to avoid network overhead
* Writes
Inserts: require shard key (object must have that field, null is a valid shard key)
Removes: routed and/or scattered
Updates: routed or scattered
* Queries
By shard key: routed
Sorted by shard key: ?? didn't get this down
* Router Operations
- split: breaking a chunk into 2
- migrate: move a chunk from 1 shard to another
* Router Balancing
- distributes chunks *automatically*
- constantly running in the background of mongo routers
- factors system resources of shard instances: disk ops, cpu, data size
* Multi-DataCenter aware
- Intelligent geo honing
- Auto failover
* Limitations
- Unique indexes not based on shard key
- Doesn't scale past 20 Petabytes (sigh)
* Monitoring
- Use db.chunks.find() to introspect shards
* Demo
Wow. Demoing a live running 25-node cluster with replication + sharding. Been running since 8am. Sustaining 5M ops/sec. Currently doing 8.1M ops/sec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment