Skip to content

Instantly share code, notes, and snippets.

@deependhamecha
Last active January 27, 2024 08:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save deependhamecha/e88f300af0c1d1ae48ba7f8ed0376683 to your computer and use it in GitHub Desktop.
Save deependhamecha/e88f300af0c1d1ae48ba7f8ed0376683 to your computer and use it in GitHub Desktop.
MongoDB

Introduction

MongoDB is a powerful, flexible, and scalable general-purpose database. MongoDB is a document-oriented database, not a relational one. The prmiary reason for moving away from the relational model is to make scaling out easier, but there are some other advantages as well.

A document-oriented database replaces the concept of a "row" with a more flexible model, the "document". There are also no predefined schemas: a document's key and values are not of fixed types or sizes.

Scaling a database comes down to the choice between scaling up(getting a bigger achine) or scaling out(paritioning data across more machines).

MongoDB was designed to scale out. Its document-oriented data model makes it easier for it to split up data across multiple servers. MongoDB automatically takes care of balancing data and load across a cluster, redistributing documents automatically and routing user requests to the correct machines. This allows developers to focus on programming the application, not scaling it. When a cluster need more capacity, new machines can be added and MongoDB will figure out how the existing data should be spread to them.

MongoDB adds dynamic padding to doucments and preallocates data files to trade extra space usage for consistent performance. It uses as much of RAM as it can as its cache and attempts to automatically choose the correct indexes for queries. In short, almost every aspect of MongoDB was designed to maintain high performance.

MongoDB is powerful and attempts to keep many features from relational systems, it is not intended to do everything that a relational database does.

Simple mongodb explained:

  • A document is the basic unit of data for MongoDB and is roughly equivalent to a row in a relational database management system.
  • A collection can be thought of as a table with dynbamic schema.
  • A single instance of MongoDB can host multiple independent databases, each of which can have its own collections.
  • Every document has a special key, "_id", that is unique within a collection.
  • MongoDB comes with a simple but powerful Javascript shell, which is useful for the administration of MongoDB instances and data manipulation.

Why MongoDB?

MongoDB is powerful but easy to get started with. In this chapter we’ll introduce some of the basic concepts of MongoDB:

  • A document is the basic unit of data for MongoDB and is roughly equivalent to a row in a relational database management system (but much more expressive).
  • Similarly, a collection can be thought of as a table with a dynamic schema.
  • A single instance of MongoDB can host multiple independent databases, each of which can have its own collections.
  • Every document has a special key, "_id", that is unique within a collection.
  • MongoDB comes with a simple but powerful JavaScript shell, which is useful for the administration of MongoDB instances and data manipulation.

Any UTF-8 character is allowed in a key, with a few notable exceptions:

  • Keys must not contain the character \0 (the null character). This character is used to signify the end of a key.
  • The . and $ characters have some special properties and should be used only in certain circumstances, as described in later chapters. In general, they should be considered reserved, and drivers will complain if they are used inappropriately.

MongoDB is type-sensitive and case-sensitive.

A final important thing to note is that documents in MongoDB cannot contain duplicate keys. For example, the following is not a legal document:

{"greeting" : "Hello, world!", "greeting" : "Hello, MongoDB!"}

Key/value pairs in documents are ordered: {"x" : 1, "y" : 2} is not the same as {"y" : 2, "x" : 1}. Field order does not usually matter and you should not design your schema to depend on a certain ordering of fields.

Collections

A collection is a group of documents. Collections have dynamic schemas. For example, both of the following documents could be stored in a single collection:

{"greeting" : "Hello, world!"}
{"foo" : 5}

Note that the previous documents not only have different types for their values but also have entirely different keys. : “Why do we need separate collections at all?”

There are several good reasons:

  • Keeping different kinds of documents in the same collection can be a nightmare for developers and admins. Developers need to make sure that each query is only returning documents of a certain type or that the application code performing a query can handle documents of different shapes. If we’re querying for blog posts, it’s a hassle to weed out documents containing author data.
  • It is much faster to get a list of collections than to extract a list of the types in a collection.
  • Grouping documents of the same kind together in the same collection allows for data locality.
  • We begin to impose some structure on our documents when we create indexes.

Naming

A collection is identified by its name. Collection names can be any UTF-8 string, with a few restrictions:

  • The empty string ("") is not a valid collection name.
  • Collection names may not contain the character \0 (the null character) because this delineates the end of a collection name.
  • You should not create any collections that start with system., a prefix reserved for internal collections. For example, the system.users collection contains the database’s users, and the system.namespaces collection contains information about all of the database’s collections.
  • User-created collections should not contain the reserved character $ in the name. The various drivers available for the database do support using $ in collection names because some system-generated collections contain it. You should not use $ in a name unless you are accessing one of these collections.

Subcollections

One convention for organizing collections is to use namespaced subcollections separated by the . character. For example, an application containing a blog might have a collection named blog.posts and a separate collection named blog.authors. This is for organizational purposes only—there is no relationship between the blog collection (it doesn’t even have to exist) and its “children.”

Subcollections are a great way to organize data in MongoDB, and their use is highly recommended.

Databases

In addition to grouping documents by collection, MongoDB groups collections into databases. A single instance of MongoDB can host several databases, each grouping together zero or more collections. A good rule of thumb is to store all data for a single application in the same database.

Like collections, databases are identified by name. Database names can be any UTF-8 string, with the following restrictions:

  • The empty string ("") is not a valid database name.
  • A database name cannot contain any of these characters: /, \, ., ", *, <, >, :, |, ?, $, (a single space), or \0 (the null character). Basically, stick with alphanumeric ASCII.
  • Database names are case-sensitive, even on non-case-sensitive filesystems. To keep things simple, try to just use lowercase characters.
  • Database names are limited to a maximum of 64 bytes.

One thing to remember about database names is that they will actually end up as files on your filesystem.

admin This is the “root” database, in terms of authentication. If a user is added to the admin database, the user automatically inherits permissions for all databases. There are also certain server-wide commands that can be run only from the admin database, such as listing all of the databases or shutting down the server.

local This database will never be replicated and can be used to store any collections that should be local to a single server.

config When MongoDB is being used in a sharded setup, it uses the config database to store information about the shards.

By concatenating a database name with a collection in that database you can get a fully qualified collection name called a namespace.

Getting and Starting MongoDB

MongoDB is almost always run as a network server that clients can connect to and perform operations on.

$ mongod

When run with no arguments, mongod will use the default data directory, /data/db/ (or \data\db\ on the current volume on Windows). If the data directory does not already exist or is not writable, the server will fail to start. It is important to create the data directory (e.g., mkdir -p /data/db/) and to make sure your user has permission to write to the directory before starting MongoDB.

mongod also sets up a very basic HTTP server that listens on a port 1,000 higher than the main port, in this case 28017. This means that you can get some administrative information about your database by opening a web browser and going to http://localhost:28017.

MongoDB Shell

MongoDB comes with a JavaScript shell that allows interaction with a MongoDB instance from the command line.

$ mongo

The shell automatically attempts to connect to a MongoDB server on startup, so make sure you start mongod before starting the shell.

The shell is a full-featured JavaScript interpreter.

We can also leverage all of the standard JavaScript libraries:

Math.sin(Math.PI / 2);

We can even define and call JavaScript functions:

> function factorial (n) {
... if (n <= 1) return 1;
... return n * factorial(n - 1);
... }
> factorial(5);

Pressing Enter three times in a row will cancel the half-formed command and get you back to the >- prompt.

Use

use dbname

To select the database.

Data Types

To be continued


Creating, Updating and Deleting Documents

db.foo.insert({"bar": "baz"})

Batch Insert

If you have a situation where you are inserting multiple documents into a collection, you can make the insert faster by using batch inserts. Batch inserts allow you to pass an array of documents to the database. In the shell, you can try this out using the batchInsert function, which is similar to insert except that it takes an array of documents to insert:

db.foo.batchInsert([{"_id": 0}, {"_id": 1}, {"_id": 2}])
db.foo.find()

Sending dozens, hundreds, or even thousands of documents at a time can make inserts significantly faster. Batch inserts are only useful if you are inserting multiple documents into a single collection: you cannot use batch inserts to insert into multiple collections with a single request.

Current versions of MongoDB do not accept messages longer than 48 MB, so there is a limit to how much can be inserted in a single batch insert. If you attempt to insert more than 48 MB, many drivers will split up the batch insert into multiple 48 MB batch inserts. Check your driver documentation for details. If you are importing a batch and a document halfway through the batch fails to be inserted, the documents up to that document will be inserted and everything after that document will not:

db.foo.batchInsert([{"_id" : 0}, {"_id" : 1}, {"_id" : 1}, {"_id" : 2}])

If you want to ignore errors and make batchInsert attempt to insert the rest of the batch, you can use the continueOnError option to continue after an insert failure. This would insert the first, second, and fourth documents above. The shell does not support this option, but all the drivers do.

Insert Validation

One of the basic structure checks is size: all documents must be smaller than 16 MB. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance. To see the BSON size (in bytes) of the document doc, run Object.bsonsize(doc) from the shell.

These minimal checks also mean that it is fairly easy to insert invalid data (if you are trying to). Thus, you should only allow trusted sources, such as your application servers, to connect to the database. All of the drivers for major languages (and most of the minor ones, too) do check for a variety of invalid data (documents that are too large, contain non-UTF-8 strings, or use unrecognized types) before sending anything to the database.

Removing Documents

db.foo.remove()

This will remove all of the documents in the foo collection. This doesn’t actually remove the collection, and any meta information about it will still exist. The remove function optionally takes a query document as a parameter. When it’s given, only documents that match the criteria will be removed. Suppose, for instance, that we want to remove everyone from the mailing.list collection where the value for "optout" is true:

db.mailing.list.remove({"opt-out" : true})

Once data has been removed, it is gone forever. There is no way to undo the remove or recover deleted documents.

Remove Speed

Removing documents is usually a fairly quick operation, but if you want to clear an entire collection, it is faster to drop it. Drop the collection

db.tester.drop()

This is obviously a vast improvement, but it comes at the expense of gran‐ ularity: we cannot specify any criteria. The whole collection is dropped, and all of its metadata is deleted.

Update Documents

Once a document is stored in the database, it can be changed using the update method. update takes two parameters: a query document, which locates documents to update, and a modifier document, which describes the changes to make to the documents found. Updating a document is atomic.

Document Replacement

The simplest type of update fully replaces a matching document with a new one.

var joe = db.users.findOne({"name" : "joe"});
joe.relationships = {"friends" : joe.friends, "enemies" : joe.enemies};
delete joe.friends;
db.users.update({"name": "joe"}, joe);

A common mistake is matching more than one document with the criteria and then creating a duplicate "_id" value with the second parameter. The database will throw an error for this, and no documents will be updated.

For example, suppose we create several documents with the same value for "name", but we don’t realize it:

> db.people.find()
{"_id" : ObjectId("4b2b9f67a1f631733d917a7b"), "name" : "joe", "age" : 65},
{"_id" : ObjectId("4b2b9f67a1f631733d917a7c"), "name" : "joe", "age" : 20},
{"_id" : ObjectId("4b2b9f67a1f631733d917a7d"), "name" : "joe", "age" : 49},

Now, if it’s Joe #2’s birthday, we want to increment the value of his "age" key, so we might say this:

> joe = db.people.findOne({"name" : "joe", "age" : 20});
{
 "_id" : ObjectId("4b2b9f67a1f631733d917a7c"),
 "name" : "joe",
 "age" : 20
}
> joe.age++;
> db.people.update({"name" : "joe"}, joe);

E11001 duplicate key on update

What happened? When you call update, the database will look for a document matching {"name" : "joe"}. The first one it finds will be the 65-year-old Joe. It will attempt to replace that document with the one in the joe variable, but there’s already a document in this collection with the same "_id". Thus, the update will fail, because "_id" values must be unique. The best way to avoid this situation is to make sure that your update always specifies a unique document, perhaps by matching on a key like "_id". For the example above, this would be the correct update to use:

> db.people.update({"_id" : ObjectId("4b2b9f67a1f631733d917a7c")}, joe)

Using "_id" for the criteria will also be faster than querying on random fields, as "_id" is indexed.

Using Modifiers

Usually only certain portions of a document need to be updated. You can update specific fields in a document using atomic update modifiers. Update modifiers are special keys that can be used to specify complex update operations, such as altering, adding, or removing keys, and even manipulating arrays and embedded documents.

We can use update modifiers to do this increment atomically.

Consider,

{
 "_id" : ObjectId("4b253b067525f35f94b60a31"),
 "url" : "www.example.com",
 "pageviews" : 52
}
db.analytics.update({"url" : "www.example.com"}, {"$inc" : {"pageviews" : 1}});
{
 "_id" : ObjectId("4b253b067525f35f94b60a31"),
 "url" : "www.example.com",
 "pageviews" : 53
}

If condition is specified which selects multiple documents, then only first one will be updated.

$set modifier

"$set" sets the value of a field. If the field does not yet exist, it will be created. This can be handy for updating schema or adding user-defined keys. For example, suppose you have a simple user profile stored as a document that looks something like the following:

db.users.findOne()
{
 "_id" : ObjectId("4b253b067525f35f94b60a31"),
 "name" : "joe",
 "age" : 30,
 "sex" : "male",
 "location" : "Wisconsin"
}
> db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")}, {"$set" : {"favorite book" : "War and Peace"}});

It will remove the key,

> db.users.update({"name" : "joe"}, {"$unset" : {"favorite book" : 1}})

It actually does a full-document replacement, re‐placing the matched document with {"foo" : "bar"}. Always use $ operators for modifying individual key/value pairs.

Incrementing and decrementing

It is very useful for updating analytics, karma, votes, or anything else that has a changeable, numeric value.

"$inc" can be used only on values of type integer, long, or double. If it is used on any other type of value, it will fail.

Cannot apply $inc modifier to non-number

Also, the value of the "$inc" key must be a number. Doing so will give a “Modifier "$inc" allowed for numbers only” error message.

Array Modifiers

Adding Modifiers

"$push" adds elements to the end of an array if the array exists and creates a new array if it does not.

db.blog.posts.update({"title": "A blog post"}, {
    "$push": {"comments": {"name": "joe", "email": "joe@example.com", "content": "nice post."}}
});

This is the “simple” form of push, but you can use it for more complex array operations as well. You can push multiple values in one operation using the "$each" suboperator:

> db.stock.ticker.update({"_id" : "GOOG"}, {"$push" : {"hourly" : {"$each" : [562.776, 562.790, 559.123]}}});

This would push three new elements onto the array. Specify an single-element array to get equivalent behavior to the non-$each form of "$push".

Another example, consider,

db.blog.insert({"names": []});

Push the array to names array using $each one by one,

db.blog.update({"_id": ObjectId('65a4edf50ba15114c3f25036')}, {"$push": {"names": {"$each": ["Deepen", "Neha"]}}})

Output

[
  {
    _id: ObjectId('65a4edf50ba15114c3f25036'),
    names: [ 'Deepen', 'Neha' ]
  }
]

If you only want the array to grow to a certain length, you can also use the "$slice" operator in conjunction with "$push" to prevent an array from growing beyond a certain size.

> db.movies.find({"genre" : "horror"},
    {"$push" : {"top10" : {
    "$each" : ["Nightmare on Elm Street", "Saw"],
    "$slice" : -10}}}
)

This example would limit the array to the last 10 elements pushed. Slices must always be negative numbers. If the array was smaller than 10 elements (after the push), all elements would be kept. If the array was larger than 10 elements, only the last 10 elements would be kept. Thus, "$slice" can be used to create a queue in a document.

Finally, you can "$sort" before trimming, so long as you are pushing subobjects onto the array:

> db.movies.find({"genre" : "horror"},
 {"$push" : {"top10" : {
 "$each" : [{"name" : "Nightmare on Elm Street", "rating" : 6.6},
 {"name" : "Saw", "rating" : 4.3}],
 "$slice" : -10,
 "$sort" : {"rating" : -1}}}})

This will sort all of the objects in the array by their "rating" field and then keep the first 10. Note that you must include "$each"; you cannot just "$slice" or "$sort" an array with "$push".

Not Equal To

db.blog.insert({"title": "My First Blog", content: "My First Content", "date": new Date(), "pageviews": 0});
db.blog.insert({"title": "My Second Blog", content: "My Second Content", "date": new Date(), "pageviews": 0});
db.blog.insert({"title": "My Third Blog", content: "My Third Content", "date": new Date(), "pageviews": 0});
db.blog.find({"title": {"$ne": "My Second Blog"}})
Using arrays as sets

You might want to treat an array as a set, only adding values if they are not present. This can be done using a "$ne" in the query document.

Shitty Syntax

db.students.update({"_id": {"$ne": ObjectId('65a63a12f88aa6b5c6746aa7')}}, {"$push": {"names": {"Jigar": "Chirag"}}})

This can also be done using $addToSet:

 db.students.update({"_id": {"$ne": ObjectId('65a63a12f88aa6b5c6746aa7')}}, {"$addToSet": {"names": "Ravi"}})

OUTPUT

[
  {
    _id: ObjectId('65a63a12f88aa6b5c6746aa7'),
    names: [ 'Deepen', 'Neha' ]
  },
  {
    _id: ObjectId('65a63a17f88aa6b5c6746aa8'),
    names: [ 'Jaimin', 'Dharmin', { Jigar: 'Chirag' }, 'Ravi' ],
    Jigar: [ 'Chirag' ],
    cousins: [ [ 'Jigar', 'Chirag' ] ]
  }
]

"$each" with "$addToSet"

db.students.update({"_id": {"$ne": ObjectId('65a63a12f88aa6b5c6746aa7')}}, {"$addToSet": {"names": {"$each": ["Dilisha", "Sharvil"]}}})

OUTPUT

[
  {
    _id: ObjectId('65a63a12f88aa6b5c6746aa7'),
    names: [ 'Deepen', 'Neha' ]
  },
  {
    _id: ObjectId('65a63a17f88aa6b5c6746aa8'),
    names: [
      'Jaimin',
      'Dharmin',
      { Jigar: 'Chirag' },
      'Ravi',
      'Dilisha',
      'Sharvil'
    ],
    Jigar: [ 'Chirag' ],
    cousins: [ [ 'Jigar', 'Chirag' ] ]
  }
]
Removing elements

There are few ways to remove elements from an array. If you want to treat the array like a queue or a stack, you can use "$pop", which can rmeove elements from either end.

{"$pop": {"key": 1}} removes an element from the end of the array. {"$pop": {"key": -1}} removes it from the beginning.

$pull is used to rmeove elements of an array that match the given criteria.

db.lists.insert({"todo": ["dishes", "laundry", "dry cleaning"]});
db.lists.update({}, {"$pull": {"todo": "laundry"}});

Pulling rmeoves all matching documents, not just a single match. If you have an array that looks like [1,1,2,1] and pull 1, you'll end up with a single-element array [2].

Array operators can be used only on keys with array values. For example, you cannot push on to an integrer or pop off of a string, for example, Use $set or $inc to modify scalar values.

Positional array modifications

Lets add comments to blog collection.

db.blog.updateMany({}, {"$set": {"comments": []}})

Now blog collection is

[
  {
    _id: ObjectId('65a638baf88aa6b5c6746aa4'),
    title: 'My First Blog',
    content: 'My First Content',
    date: ISODate('2024-01-16T08:05:14.148Z'),
    pageviews: 0,
    comments: []
  },
  {
    _id: ObjectId('65a638bef88aa6b5c6746aa5'),
    title: 'My Second Blog',
    content: 'My Second Content',
    date: ISODate('2024-01-16T08:05:18.552Z'),
    pageviews: 0,
    comments: []
  },
  {
    _id: ObjectId('65a638d9f88aa6b5c6746aa6'),
    title: 'My Third Blog',
    content: 'My Third Content',
    date: ISODate('2024-01-16T08:05:45.482Z'),
    pageviews: 0,
    comments: []
  }
]

Lets push data in comments array,

db.blog.update({"title": "My First Blog"}, {"$push": {"comments": {"name": "Deepen", "email": "code.deepen@gmail.com", "text": "Nice Post."}}})
db.blog.update({"title": "My First Blog"}, {"$push": {"comments": {"name": "Neha", "email": "neha@gmail.com", "text": "Good Read."}}})
db.blog.update({"title": "My First Blog"}, {"$push": {"comments": {"name": "Ravi", "email": "ravi@gmail.com", "text": "Appreciated."}}})

Now, suppose you want to change name of the matching email,

 db.blog.update({"comments.email": "code.deepen@gmail.com"}, {"$set": {"comments.$.name": "Deepen Dhamecha"}})

$ is the matching index position. Positional operator updates only the first match. Thus if code.deepen@gmail.com had one than one matching value, only first one would be updated.

Modifier Speed

Some modifiers are faster than others. $inc modifies a document in place: it does not have to change the size of a document, only a couple of bytes, so it is very efficient. On the other hand, array modifiers might change the size of a document and can be slow. $set can modify documents in place if the size isn't changing but otherwise is subject to the same performance limitations as array operators.

Important Note When you start inserting documents into MongoDB, it puts each document right next to the previous one on disk. Thus, if a document gets bigger, it will no longer fit in the space it was originally written to and will be moved to another part of the collection. You can see this in action by creating a new collection with just a few documents and then making a document that is sandwiched between two other documents larger. It will be bumped to the end of the collection:

db.coll.insert({"x" :"a"})
db.coll.insert({"x" :"b"})
db.coll.insert({"x" :"c"})
db.coll.find()

{ "_id" : ObjectId("507c3581d87d6a342e1c81d3"), "x" : "a" }
{ "_id" : ObjectId("507c3583d87d6a342e1c81d4"), "x" : "b" }
{ "_id" : ObjectId("507c3585d87d6a342e1c81d5"), "x" : "c" }

db.coll.update({"x" : "b"}, {$set: {"x" : "bbb"}})
db.coll.find()

{ "_id" : ObjectId("507c3581d87d6a342e1c81d3"), "x" : "a" }
{ "_id" : ObjectId("507c3585d87d6a342e1c81d5"), "x" : "c" }
{ "_id" : ObjectId("507c3583d87d6a342e1c81d4"), "x" : "bbb" }

When MongoDB has to move a document, it bumps the collection's padding factor which is the amount of extra space MongoDB leaves around new document to give them room to grow. You can see the padding factory by running db.coll.stats(). Each new document will be given half of its size in free space to grow. If subsequent updates cause more moves, the padding factor will continue to grow. If there aren't more moves, the padding factor will slowly go down.

Moving documents is slow. MongoDB has to free the space the document was in and write the document somewhere else. Thus, you should try to keep the padding factor as close to 1 as possible. You cannot manually set the padding factory (unless you're compacting the collection), but you can design a schema that does not depend on documents growing arbitrarily large.

If you have a lot of empty space, you'll start seeing messages that look like this in the logs:

extend a:81daf914 was empty, skipping ahead

That means that, while querying, MongoDB looked through an entire extend without finding any documents: it was just empty space. The message itself is harmless, but it indicates that you have fragmentation and may wish to perform a compact.

If your schema requires lots of moves or lots of churn through inserts and deletes, you can improve disk reuse by using the usePowerOf2Sizes option. You can set this with the collMod command:

db.runCommand({"collMod": collectionName, "usePowerOf2Sizes": true})

All subsequent allocations made by the collection will be in power-of-two-sized blocks. Only use this option on high-churn collections, though, as this makes initial space allocation less efficient. Setting this on an insert- or in-place-update-only collection will make writes slower. Running this command with "usePowerOf2Sizes" : false turns off the special allocation. The option only affects newly allocated records, so there is no harm in running it on an existing collection or toggling the value.

Upserts

An upsert is a special type of update. If no document is found that matches the update criteria, a new document will be created by combining the criteria and updated documents. If a matching document is found, it will be updated normally. Upserts can be handy because they can eliminate the need to “seed” your collection: you can often have the same code create and update documents.

db.users.update({"rep" : 25}, {"$inc" : {"rep" : 3}}, true)

Sometimes a field needs to be seeded when a document is created, but not changed on subsequent updates. This is what "$setOnInsert" is for. "$setOnInsert" is a modifier that only sets the value of a field when the document is being inserted. Thus, we could do something like this:

db.users.update({}, {"$setOnInsert" : {"createdAt" : new Date()}}, true)

Note that you generally do not need to keep a "createdAt" field, as ObjectIds contain a timestamp of when the document was created. However, "$setOnInsert" can be useful for creating padding, initializing counters, and for collections that do not use ObjectIds.

save shell helper

save is a shell function that lets you insert a document if it doesn’t exist and update it if it does.

var x = db.foo.findOne()
x.num = 42;
db.foo.save(x);

Updating Multiple Documents

Updates, by default, update only the first document found that matches the criteria. If there are more matching documents, they will remain unchanged. To modify all of the documents matching the criteria, you can pass true as the fourth parameter to update.

Returning UPdated Documents

You can get some limited information about what was updated by calling getLastError, but it does not actually return the updated document. For that, you’ll need the findAndModify command. It is handy for manipulating queues and performing other operations that need get-and-set style atomicity.

Setting a Write Concern

There are a number of options available to tune exactly what you want the application to wait for. The two basic write concerns are acknowledged or unacknowledged writes. Acknowledged writes are the default: you get a response that tells you whether or not the database successfully processed your write. Unacknowledged writes do not return any response, so you do not know if the write succeeded or not.

Querying

Remember, there are 2 types of queries to be written and categorized:

  1. Outer: keys are written inside it. For example, check "$or".
  2. Inner: keys are wrapped outside it. For example, check "$eq".

Keys are called Modifiers and operators are called Conditionals.

Introduction to find

The find method is used to perform queries in MongoDB. Querying returns a subset of documents in a collection, from no documents at all to the entire collection.

An empty query document (i.e., {}) matches everything in the collection. If find isn’t given a query document, it defaults to {}. For example, the following:

db.c.find()

matches every document in the collection c.

Specifying which keys to Return

Sometimes you do not need all of the key/value pairs in a document returned. If this is the case, you can pass a second argument to find (or findOne) specifying the keys you want.

db.users.find({}, {"title": true});

Above will return _id as well. If you want _id to not return id then do below:

db.users.find({}, {"title": 1, "_id": 0});

OR

db.users.find({}, {"title": true, "_id": false});

Query Conditionals

"$lt", "$lte", "$gt", and "$gte" are all comparison operators, corresponding to <, <=, >, and >=, respectively.

db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

This would find all documents where the "age" field was greater than or equal to 18 AND less than or equal to 30.

start = new Date("01/01/2007")
db.users.find({"registered" : {"$lt" : start}})

Exact match on date is less useful, since dates are only stored with millisecond precision.

"$ne", which stands for "not equal".

db.blog.find({"title": {"$ne" : "My First Blog"}});

OR, AND

There are two ways to do an OR query in MongoDB. "$in" can be used to query for a variety of values for a single key. "$or" is more general; it can be used to query for any of the given values across multiple keys. If you have more than one possible value to match for a single key, use an array of criteria with "$in". "$and" also works with same syntax.

Suppose, you have this data,

[
  {
    _id: ObjectId('65a638baf88aa6b5c6746aa4'),
    title: 'My First Blog',
    content: 'My First Content',
    date: ISODate('2024-01-16T08:05:14.148Z'),
    pageviews: 1,
    comments: [
      {
        name: 'Deepen Dhamecha',
        email: 'code.deepen@gmail.com',
        text: 'Nice Post.'
      },
      {
        name: 'Deepen',
        email: 'code.deepen@gmail.com',
        text: 'Good Read.'
      },
      {
        name: 'Deepen',
        email: 'code.deepen@gmail.com',
        text: 'Appreciated.'
      },
      { name: 'Neha', email: 'neha@gmail.com', text: 'Good Read.' },
      { name: 'Ravi', email: 'ravi@gmail.com', text: 'Appreciated.' }
    ]
  },
  {
    _id: ObjectId('65a638bef88aa6b5c6746aa5'),
    title: 'My Second Blog',
    content: 'My Second Content',
    date: ISODate('2024-01-16T08:05:18.552Z'),
    pageviews: 2,
    comments: []
  },
  {
    _id: ObjectId('65a638d9f88aa6b5c6746aa6'),
    title: 'My Third Blog',
    content: 'My Third Content',
    date: ISODate('2024-01-16T08:05:45.482Z'),
    pageviews: 0,
    comments: []
  }
]
db.blog.find({"pageviews": {"$in": [0, 2]}})

The opposite of "$in" is "$nin", which returns documents that don't match any of the criteria in the array.

For this type of query, we'll need to use the "$or" conditional. "$or" takes an array of possible criteria.

db.blog.find({"$or": [{"pageviews": 0}, {"title": "My Second Blog"}]})

Lets, mix with "$in"

db.blog.find({"$or": [{"pageviews": {"$in": [0,2]}}, {"title": "My Second Blog"}]})

$eq (==)

Sample data

[
  {
    _id: ObjectId('65a638baf88aa6b5c6746aa4'),
    title: 'My First Blog',
    content: 'My First Content',
    date: ISODate('2024-01-16T08:05:14.148Z'),
    pageviews: 1,
    comments: [
      {
        name: 'Deepen Dhamecha',
        email: 'code.deepen@gmail.com',
        text: 'Nice Post.'
      },
      {
        name: 'Deepen',
        email: 'code.deepen@gmail.com',
        text: 'Good Read.'
      },
      {
        name: 'Deepen',
        email: 'code.deepen@gmail.com',
        text: 'Appreciated.'
      },
      { name: 'Neha', email: 'neha@gmail.com', text: 'Good Read.' },
      { name: 'Ravi', email: 'ravi@gmail.com', text: 'Appreciated.' }
    ]
  },
  {
    _id: ObjectId('65a638d9f88aa6b5c6746aa6'),
    title: 'My Third Blog',
    content: 'My Third Content',
    date: ISODate('2024-01-16T08:05:45.482Z'),
    pageviews: 0,
    comments: []
  },
  {
    _id: ObjectId('65ab731298db66454461b9f2'),
    title: 'My Fourth Blog',
    content: 'My Fourth Content',
    pageviews: 4,
    comments: []
  }
]

Query

 db.blog.find({title : {$eq: "My Second Blog"}})

$not

Query

db.blog.find({title : {$not: {$eq: "My Second Blog"}}})

$mod


Conditional Semantics

"$lt" is in the inner documents: in the update, "$inc" is the key for the outer document. This generally holds true: conditionals are an inner document key, and modifiers are always a key in the outer document.

db.blog.find({"pageviews": {"$lt": 3}})
db.blog.find({"pageviews": {"$gt": 1, "$lt": 3}})

This will not match 1 but greater than 1.

Lets, combine with "$or" operator:

db.blog.find({"$or": [{"pageviews": {"$gt": 1}}, {"title": {"$eq": "My Third Blog"}} ]})
[
  {
    _id: ObjectId('65a638bef88aa6b5c6746aa5'),
    title: 'My Second Blog',
    content: 'My Second Content',
    date: ISODate('2024-01-16T08:05:18.552Z'),
    pageviews: 2,
    comments: []
  },
  {
    _id: ObjectId('65a638d9f88aa6b5c6746aa6'),
    title: 'My Third Blog',
    content: 'My Third Content',
    date: ISODate('2024-01-16T08:05:45.482Z'),
    pageviews: 0,
    comments: []
  },
  {
    _id: ObjectId('65ab731298db66454461b9f2'),
    title: 'My Fourth Blog',
    content: 'My Fourth Content',
    pageviews: 4,
    comments: []
  }
]

Type-Specific Queries

null

null behaves a bit strangely. It does match itself, so if we have a collection with the following documents:

> db.c.find()
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

we can query for documents whose "y" key is null in the expected way:

> db.c.find({"y" : null})
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }

However, null not only matches itself but also matches “does not exist.” Thus, querying for a key with the value null will return all documents lacking that key:

> db.c.find({"z" : null})
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

If we only want to find keys whose value is null, we can check that the key is null and exists using the "$exists" conditional:

> db.c.find({"z" : {"$in" : [null], "$exists" : true}})

Unfortunately, there is no "$eq" operator, which makes this a little awkward, but "$in" with one element is equivalent.

Regular Expressions

If we want to find all users with the name Joe or joe, we can use a regular expression to do case-insensitive matching:

db.users.find({"name": /joe/i});

Querying Arrays

db.food.find({"fruit": "banana"})

$all

If you need to match arrays by more than one element, you can use "$all". This allows you to match a list of elements.

db.food.insert({"_id" : 1, "fruit" : ["apple", "banana", "peach"]})
db.food.insert({"_id" : 2, "fruit" : ["apple", "kumquat", "orange"]})
db.food.insert({"_id" : 3, "fruit" : ["cherry", "banana", "apple"]})

Then we can find all documents with both "apple" and "banana" elements by querying with "$all":

db.food.find({fruit : {$all : ["apple", "banana"]}});
{"_id" : 1, "fruit" : ["apple", "banana", "peach"]}
{"_id" : 3, "fruit" : ["cherry", "banana", "apple"]}

Using a one-element array with "$all" is equivalent to not using "$all". For instance, {fruit : {$all : ['apple']} will match the same documents as {fruit : 'apple'}.

db.food.find({"fruit" : ["apple", "banana", "peach"]})

But this will not:

db.food.find({"fruit" : ["apple", "banana"]})

and neither will this:

db.food.find({"fruit" : ["banana", "apple", "peach"]})

If you want to query for a specific element of an array, you can specify an index using the syntax key.index:

db.food.find({"fruit.2" : "peach"})

Arrays are always 0-indexed, so this would match the third array element against the string "peach".

$size

One common query is to get a range of sizes. "$size" cannot be combined with another $ conditional (in this example, "$gt"), but this query can be accomplished by adding a "size" key to the document.

Incrementing is extremely fast, so any performance penalty is negligible.

db.food.find({"size" : {"$gt" : 3}})

Unfortunately, this technique doesn’t work as well with the "$addToSet" operator.

$slice

For example, suppose we had a blog post document and we wanted to return the first 10 comments:

db.blog.posts.findOne(criteria, {"comments" : {"$slice" : 10}})

Alternatively, if we wanted the last 10 comments, we could use −10:

db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -10}})

"$slice" can also return pages in the middle of the results by taking an offset and the number of elements to return:

db.blog.posts.findOne(criteria, {"comments" : {"$slice" : [23, 10]}})

Returning a matching array element

"$slice" is helpful when you know the index of the element, but sometimes you want whichever array element matched your criteria.

Array and range query interactions

Scalars (non-array elements) in documents must match each clause of a query’s criteria. For example, if you queried for {"x" : {"$gt" : 10, "$lt" : 20}}, "x" would have to be both greater than 10 and less than 20. However, if a document’s "x" field is an array, the document matches if there is an element of "x" that matches each part of the criteria but each query clause can match a different array element.

The best way to understand this behavior is to see an example. Suppose we have the following documents:

{"x" : 5}
{"x" : 15}
{"x" : 25}
{"x" : [5, 25]}

If we wanted to find all documents where "x" is between 10 and 20, one might naively structure a query as db.test.find({"x" : {"$gt" : 10, "$lt" : 20}}) and expect to get back one document: {"x" : 15}. However, running this, we get two:

> db.test.find({"x" : {"$gt" : 10, "$lt" : 20}})
{"x" : 15}
{"x" : [5, 25]}

Neither 5 nor 25 is between 10 and 20, but the document is returned because 25 matches the first clause (it is greater than 25) and 5 matches the second clause (it is less than 20). This makes range queries against arrays essentially useless: a range will match any multi-element array. There are a couple of ways to get the expected behavior. First, you can use "$elemMatch" to force MongoDB to compare both clauses with a single array element. However, the catch is that "$elemMatch" won’t match non-array elements:

db.test.find({"x" : {"$elemMatch" : {"$gt" : 10, "$lt" : 20}})
// no results

The document {"x" : 15} no longer matches the query, because the "x" field is not an array. If you have an index over the field that you’re querying on (see Chapter 5), you can use min() and max() to limit the index range traversed by the query to your "$gt" and "$lt" values:

db.test.find({"x" : {"$gt" : 10, "$lt" : 20}).min({"x" : 10}).max({"x" : 20})
{"x" : 15}

Now this will only traverse the index from 10 to 20, missing the 5 and 25 entries. You can only use min() and max() when you have an index on the field you are querying for, though, and you must pass all fields of the index to min() and max().

Querying on Embedded Documents

There are two ways of querying for an embedded document: querying for the whole document or querying for its individual key/value pairs.

Dot notation is also the reason that documents to be inserted cannot contain the . character.

To correctly group criteria without needing to specify every key, use "$elemMatch". This vaguely-named conditional allows you to partially specify criteria to match a single embedded document in an array. The correct query looks like this:

db.blog.find({"comments" : {"$elemMatch" : {"author" : "joe", "score" : {"$gte" : 5}}}})

"$elemMatch" allows us to “group” our criteria. As such, it’s only needed when you have more than one key you want to match on in an embedded document.

$where Queries

For security, use of "$where" clauses should be highly restricted or eliminated. End users should never be allowed to execute arbitrary "$where" clauses.

If the function returns true, the document will be part of the result set; if it returns false, it won’t be.

"$where" queries should not be used unless strictly necessary: they are much slower than regular queries. Each document has to be converted from BSON to a JavaScript object and then run through the "$where" expression. Indexes cannot be used to satisfy a "$where", either. Hence, you should use "$where" only when there is no other way of doing the query.

Server-Side Scripting

You must be very careful with security when executing JavaScript on the server. If done incorrectly, server-side JavaScript is susceptible to injection attacks similar to those that occur in a relational database. However, by following certain rules around accepting input, you can use JavaScript safely. Alternatively, you can turn off JavaScript execution altogether by running mongod with the --noscripting option.

Cursors

The database returns results from find using a cursor. You can limit the number of results, skip over some number of results, sort results by any combination of keys in any direction, and perform a number of other powerful operations.

for(i=0; i<100; i++) {
... db.collection.insert({x : i});
... }
var cursor = db.collection.find();

If you store the results in a global variable or no variable at all, the MongoDB shell will automatically iterate through and display the first couple of documents.

To iterate through the results, you can use the next method on the cursor. You can use hasNext to check whether there is another result. A typical loop through results looks like the following:

while (cursor.hasNext()) {
... obj = cursor.next();
... // do stuff
... }

cursor.hasNext() checks that the next result exists, and cursor.next() fetches it. The cursor class also implements JavaScript’s iterator interface, so you can use it in a forEach loop:

var cursor = db.people.find();
cursor.forEach(function(x) {
... print(x.name);
... });
adam
matt
zak

Almost every method on a cursor object returns the cursor itself so that you can chain options in any order. For instance, all of the following are equivalent:

var cursor = db.foo.find().sort({"x" : 1}).limit(1).skip(10);
var cursor = db.foo.find().limit(1).sort({"x" : 1}).skip(10);
var cursor = db.foo.find().skip(10).limit(1).sort({"x" : 1});

At this point, the query has not been executed yet. All of these functions merely build the query. Now, suppose we call the following:

> cursor.hasNext()

At this point, the query will be sent to the server. The shell fetches the first 100 results or first 4 MB of results (whichever is smaller) at once so that the next calls to next or hasNext will not have to make trips to the server. After the client has run through the first set of results, the shell will again contact the database and ask for more results with a getMore request. getMore requests basically contain an identifier for the query and ask the database if there are any more results, returning the next batch if there are. This process continues until the cursor is exhausted and all results have been returned.

Limit, Skips and Sorts

The most common query options are limiting the number of results returned, skipping a number of results, and sorting. All these options must be added before a query is sent to the database.

To set a limit, chain the limit function onto your call to find.

db.c.find().limit();

limit sets an upper limit, not a lower limit. skip works similarly to limit:

db.c.find().skip(3)

This will skip the first three matching documents and return the rest of the matches.

sort takes an object: a set of key/value pairs where the keys are key names and the values are the sort directions. Sort direction can be 1 (ascending) or −1 (descending). If multiple keys are given, the results will be sorted in that order.

These three methods can be combined. This is often handy for pagination. For example, suppose that you are running an online store and someone searches for mp3. If you want 50 results per page sorted by price from high to low, you can do the following:

db.stock.find({"desc" : "mp3"}).limit(50).sort({"price" : -1})

If that person clicks Next Page to see more results, you can simply add a skip to the query, which will skip over the first 50 matches (which the user already saw on page 1):

db.stock.find({"desc" : "mp3"}).limit(50).skip(50).sort({"price" : -1})

However, large skips are not very performant; there are suggestions for how to avoid them in the next section.

Comparison order

MongoDB has a hierarchy as to how types compare. If you do a sort on a key with a mix of types, there is a predefined order that they will be sorted in. From least to highest value, this ordering is as follows:

  1. Minimum value
  2. null
  3. Numbers (integers, longs, doubles)
  4. Strings
  5. Object/document
  6. Array
  7. Binary data
  8. Object ID
  9. Boolean
  10. Date
  11. Timestamp
  12. Regular expression
  13. Maximum value

Avoiding Large Skips

Using skip for a small number of documents is fine. For a large number of results, skip can be slow, since it has to find and then discard all the skipped results. Most databases keep more metadata in the index to help with skips, but MongoDB does not yet support this, so large skips should be avoided.

Paginating results without skip

The easiest way to do pagination is to return the first page of results using limit and then return each subsequent page as an offset from the beginning:

var page1 = db.foo.find(criteria).limit(100)
var page2 = db.foo.find(criteria).skip(100).limit(100)
var page3 = db.foo.find(criteria).skip(200).limit(100)

Then, we can use the "date" value of the last document as the criteria for fetching the next page:

var latest = null;
// display first page
while (page1.hasNext()) {
 latest = page1.next();
 display(latest);
}
// get next page
var page2 = db.foo.find({"date" : {"$gt" : latest.date}});
page2.sort({"date" : -1}).limit(100);

Now the query does not need to include a skip.

Finding a random document

It is actually highly inefficient to get a random element this way: you have to do a count (which can be expensive if you are using criteria), and skipping large numbers of elements can be time-consuming.

The trick is to add an extra random key to each document when it is inserted.

db.people.insert({"name" : "jim", "random" : Math.random()})

If we want to find a random plumber in California, we can create an index on "profession", "state", and "random":

db.people.ensureIndex({"profession" : 1, "state" : 1, "random" : 1})

Instead of sending {"foo" : "bar"} to the database as the query, the query gets wrapped in a larger document. The shell converts the query from {"foo" : "bar"} to {"$query" : {"foo" : "bar"}, "$orderby" : {"x" : 1}}.

$maxscan : integer

Specify the maximum number of documents that should be scanned for the query.
> db.foo.find(criteria)._addSpecial("$maxscan", 20)

Similar to this is $min, $max and $showDiskLoc.

Immortal Cursors

There are two sides to a cursor: the client-facing cursor and the database cursor that the client-side one represents. We have been talking about the client-side one up until now, but we are going to take a brief look at what’s happening on the server. On the server side, a cursor takes up memory and resources. Once a cursor runs out of results or the client sends a message telling it to die, the database can free the resources it was using. Freeing these resources lets the database use them for other things, which is good, so we want to make sure that cursors can be freed quickly (within reason). There are a couple of conditions that can cause the death (and subsequent cleanup) of a cursor. First, when a cursor finishes iterating through the matching results, it will clean itself up. Another way is that, when a cursor goes out of scope on the client side, the drivers send the database a special message to let it know that it can kill that cursor. Finally, even if the user hasn’t iterated through all the results and the cursor is still in scope, after 10 minutes of inactivity, a database cursor will automatically “die.” This way, if a client crashes or is buggy, MongoDB will not be left with thousands of open cursors. This “death by timeout” is usually the desired behavior: very few applications expect their users to sit around for minutes at a time waiting for results. However, sometimes you might know that you need a cursor to last for a long time. In that case, many drivers have implemented a function called immortal, or a similar mechanism, which tells the database not to time out the cursor. If you turn off a cursor’s timeout, you must iterate through all of its results or kill it to make sure it gets closed. Otherwise, it will sit around in the database hogging resources until the server is restarted.

Database Commands

There is one very special type of query called a database command. We’ve covered creating, updating, deleting, and finding documents. Database commands do “every‐ thing else,” from administrative tasks like shutting down the server and cloning databases to counting documents in a collection and performing aggregations.

You might be more familiar with the shell helper, which wraps the command and provides a simpler interface:

db.test.drop()

The shell might not have the wrappers for new database commands, but you can still run them with runCommand().

db.runCommand({getLastError : 1})

How Commands Work

A database command always returns a document containing the key "ok". If "ok" is 1, the command was successful; and if it is 0, the command failed for some reason. If "ok" is 0 then an additional key will be present, "errmsg". The value of "errmsg" is a string explaining why the command failed. As an example, let’s try running the drop command again, on the collection that was dropped in the previous section:

db.runCommand({"drop" : "test"});
{ "errmsg" : "ns not found", "ok" : false }

Some commands require administrator access and must be run on the admin database. If such a command is run on any other database, it will return an "access denied" error. If you’re working on another database and you need to run an admin command, you can use the adminCommand function, instead of runCommand:

> use temp
switched to db temp
> db.runCommand({shutdown:1})
{ "errmsg" : "access denied; use admin db", "ok" : 0 }
> db.adminCommand({"shutdown" : 1})
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment