by Jon Eisen
Indexes are awesome because:
- take a linear search and make it logarithmic (i.e. search through 8 items in 3 steps instead of 8).
- eliminate in-memory sorts.
- reduce hard drive lookups and page faulting.
db.collection.ensureIndex(indexed_fields, [options]);
Create an index.db.collection.getIndexes();
See indexes.db.collection.dropIndex(indexed_fields);
Remove an index.db.collection.find(find_fields).explain();
See what indexes a query uses.db.collection.find(fields).hint(index_or_name);
Force a query to use a certain index
ensureIndex({_id: 1}, {unique: true})
is always (mostly) true- A query can only use one index (exception: $or queries)
- Order of keys matters for range and sort queries.
db.subunits.find({ id: 'ZMB' }).explain();
{
"cursor" : "BasicCursor", // no index!
"n" : 1,
"nscanned" : 170, // Searched 170 objects to find 1
}
db.subunits.ensureIndex({ id: 1 }, { unique: true });
db.subunits.find({ id: 'ZMB' }).explain();
{
"cursor" : "BtreeCursor id_1", // yay index worked!
"n" : 1,
"nscanned" : 1,
}
db.subunits.find({ type: 'Polygon', rkey: 40 });
db.subunits.ensureIndex({ type: 1, rkey: 1 }, { name: 'my_custom_name' });
// Works with a prefix subset
db.subunits.find({ type:'Polygon' }) // indexed
db.subunits.find({ rkey: 40 }) // not-indexed!
db.subunits.find({ rkey: 40 }).hint('my_custom_name'); // indexed!
- Prefix indexes are optimal secondary indexes.
- Subset (non-prefix) are non-optimalsecondary indexes that must be hinted.
db.subunits.find({ type: 'Polygon', rkey: { $gt: 40 }}).sort({ 'properties.name': -1 }).explain();
{
"cursor" : "BtreeCursor type_1_rkey_1", // used the previous index
"scanAndOrder" : true, // OH NO! in-memory sort!
}
db.subunits.ensureIndex({ type: 1, rkey: 1, 'properties.name': -1 }); // conjectured index...
db.subunits.find({ type: 'Polygon', rkey: { $gt: 40 }}).sort({ 'properties.name': -1 }).explain();
{
"cursor" : "BtreeCursor type_1_rkey_1", // wait what? why didn't it use our index?
"scanAndOrder" : true, // still in-memory sorting
}
Index Equality, then Sorts (in order with correct directions), then Range queries.
db.subunits.ensureIndex({ type: 1, 'properties.name': -1, rkey: 1}); // the Correct index
db.subunits.find({ type: 'Polygon', rkey: { $gt: 40 }}).sort({ 'properties.name': -1 }).explain();
{
"cursor" : "BtreeCursor type_1_rkey_1", // what happened?
"scanAndOrder" : true, // still in-memory sorting
}
- Every time we use
.explain()
, the query optimizer finds the best (fastest) index for that query. - There is a trade-off of index lookup speed versus in-memory sort speed.
- In this case lookup speed won, but not always. Use
.hint(index_keys).explain()
to compare.
- Like regular secondary indexes, but they don't include references to documents without that field.
- Use with
{ unique: true }
to force uniqueness on only non-null values.
db.places.ensureIndex({ rkey: 1 }, { sparse: true });
db.places.find({ 'coordinates.0': { $lt: 500 } }).count() // == 2
db.places.find({ 'coordinates.0': { $lt: 500 } }).sort({rkey: 1}).count() // == 1
Beware of sorting with a sparse index as it can filter your returning dataset.
Mostly useful as a shard key. Somewhat useful when querying for object equality:
db.collection.find({ myobject: { data: "data.data.data.....", other: [ 1, 2, 3, 4, 5, 6, 7 ] } });
db.collection.ensureIndex({ myobject: 'hashed' });
mongo will now hash the object instead of comparing based on the object. Note does not support range or sort queries.
- TTL a collection: Must be used on a date field.
db.collection.ensureIndex({ timestamp: 1 }, { expireAfterSeconds: 500 });
- Drop duplicate documents:
db.collection.ensureIndex({ type: 1, user: 1 }, { unique: true, dropDups: true });
- Don't kill your database:
db.collection.ensureIndex({ sessId: 1 }, { background: true });
- Indexes are second to the working set to be kept in memory
- "Working set" is basically the amount of data AND indexes that will be active/in use by your system.
- It is much much much much better to keep your indexes in memory.
- It is not always necessary to keep all of your index in memory, only the most used.
- Profiling:
db.runCommand({ profile: 1 });
- Log files - mongoDB will log all long queries (set by
slowms
parameter in profiler)- Remember slow queries may not be badly indexed, but instead held up by locks
- dex - The mongoDB indexing tool:
dex -f my/mongodb.log -n "mydb.mycoll" mongodb://myHost:12345/mydb
Fri Aug 17 16:21:43 [conn2472] update qa.products query: { UK.META.PRODUCT_GROUP_LABEL: { $exists: true } } update: { $unset: { UK.META.PRODUCT_GROUP_LABEL: 1.0 } } 133ms
Fri Aug 17 16:21:43 [initandlisten] connection accepted from 192.168.8.101:51638 #2481
Fri Aug 17 16:21:43 [conn2481] end connection 192.168.8.101:51638
Fri Aug 17 16:21:43 [conn2472] update qa.products query: { UK.META.PRODUCT_GROUP: { $exists: true } } update: { $unset: { UK.META.PRODUCT_GROUP: 1.0 } } 116ms
The mongoDB log records long operations, such as > 100ms. There's no index on UK.META.PRODUCT_GROUP_LABEL
, but we can add it.
db.products.ensureIndex({ 'UK.META.PRODUCT_GROUP_LABEL': 1 });
I'm Jon Eisen.