Skip to content

Instantly share code, notes, and snippets.

@leommoore
Last active August 16, 2022 17:35
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save leommoore/43ef7178cec18a05ba91 to your computer and use it in GitHub Desktop.
Save leommoore/43ef7178cec18a05ba91 to your computer and use it in GitHub Desktop.
Mongo 3.2.x

#MongoDB 3.2.x

##Install MongoDB To install MongoDB on ubuntu from precompiled version.

wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1404-3.2.5.tgz
gzip -d mongodb-linux-x86_64-ubuntu1404-3.2.5.tgz
tar -xvf mongodb-linux-x86_64-ubuntu1404-3.2.5.tar

Then check put what is in the bin directory

cd mongodb-linux-x86_64-ubuntu1404-3.2.5/bin

The main elements in the bin directory are:

mongod: this is the mongodb daemon which runs the database mongo: this is the mongodb interactive shell

Add the bin directory to the path to make typing easier

PATH=$PATH:~/mongodb-linux-x86_64-ubuntu1404-3.2.1/bin

##Configure MongoDB

###Data Location If you do not specify the data location, mongo will assume the default directory of /data/db. If you prefer you can specify a different directory. This can be useful to control the location to optomize performance and security.

mkdir data
mongod --dbpath ./data

###File Type Mongo should be run in an environment which uses a file type such as EXT4 which supports large files and quick allocation. You can check the file type using:

ubuntu@myhostname:~$ mount
/dev/xvda1 on / type ext4 (rw,discard)

###Disable Last Access Time The operating system by default will update the last access time on a file. In a high data throughput database application this is overhead which will slow down the database. Therefore, disable this feature edit the fstabs file using:

sudo nano /etc/fstab

Add noatime directly after defaults,.

LABEL=cloudimg-rootfs   /        ext4   defaults,noatime        0 0

The system will need to be rebooted for this change to take effect. When you do mount you should see:

ubuntu@myhostname:~$ mount
/dev/xvda1 on / type ext4 (rw,noatime)

##Install on Ubuntu from Package Manager Mongo old supplies packages for Ubuntu LTS versions. It may work on other releases but you may have to install from the binaries like above. See https://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/ for more details.

You first need to install the repository key.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927

Add the repo:

echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list

Refresh the packages:

sudo apt-get update

Install the latest stable version:

sudo apt-get install -y mongodb-org

Or a specific version:

sudo apt-get install -y mongodb-org=3.2.1 mongodb-org-server=3.2.1 mongodb-org-shell=3.2.1 mongodb-org-mongos=3.2.1 mongodb-org-tools=3.2.1

If you just require the server (mongod) without the shell and the tools you can get it using:

sudo apt-get install -y mongodb-org-server

###Pin a specific version Apt-get will update Mongo if a new version becomes available. In a production environment you do not want the database version to change in case it could have knock on impacts on other parts (ie api, apps etc). To stop apt-get from updating the version you can do it using:

echo "mongodb-org hold" | sudo dpkg --set-selections
echo "mongodb-org-server hold" | sudo dpkg --set-selections
echo "mongodb-org-shell hold" | sudo dpkg --set-selections
echo "mongodb-org-mongos hold" | sudo dpkg --set-selections
echo "mongodb-org-tools hold" | sudo dpkg --set-selections

###Package Install Outcome The mongo binaries are installed into /usr/bin.

A configuration file /etc/mongod.conf is created and it should be readable but only writeable by the owner.

ls -l /etc/mongod.conf
-rw-r--r-- 1 root root 568 Jan 11 18:57 /etc/mongod.conf

The default folder for data is in /var/lib/mongodb.

ls -l /var/lib/mongodb
drwxr-xr-x 2 mongodb mongodb     4096 Feb 11 07:06 journal
-rw------- 1 mongodb mongodb 67108864 Feb 11 07:06 local.0
-rw------- 1 mongodb mongodb 16777216 Feb 11 07:06 local.ns
-rwxr-xr-x 1 mongodb mongodb        6 Feb 11 07:06 mongod.lock
-rw-r--r-- 1 mongodb mongodb       69 Nov  4 14:41 storage.bson

Note: the owner of the data files is a new user called mongodb which was created during the install.

The mongo logs are in /var/log/mongodb/mongod.log.

##Configuration File The mongod.conf (YAML) file looks like:

storage:
    dbPath: "/data/db"

systemLog:
    logRotate: rename
    destination: file
    path: "/data/log/mongod.log"

###Log Files
In the systemLog you can specify logRotate as rename, this will rename the old file and create a new file each time it restarts. This is the default behavior but you can also specify it explicitly. You can also opt to keep it all in the same file by specifying:

systemLog:
    logAppend: true
#Linux only, add the following
    logRotate: reopen
    
    destination: file
    path: "/data/log/mongod.log"

Some Linux users prefer to log eveything to syslog so they can use log management tools. To do this use:

systemLog:
#Linux only
    destination: syslog

###Storage Engine It is also possible to use different storage engines but memory mapped files (mmapv1) are the default option. You can also decide to not have journaling but that means that if a crash happens there is always a chance that the transaction will not be written to the disk. By default journaling is on.

storage:
    dbPath: "/data/db"
    journal:
        enabled: true
    engine: mmapv1

systemLog:     
    destination: file
    path: "/data/log/mongod.log"

###Directory Per Database You can also tell it to put each database in a seperate directory. This makes it easier for backups and db maintenance. Depending on the operating system you can mount these folders to different undelying devices which can give a performance boost as they will be using different i/o channels.

storage:
    dbPath: "/data/db"
    journal:
        enabled: true
    engine: mmapv1

    directoryPerDB: true

systemLog:     
    destination: file
    path: "/data/log/mongod.log"

Normally the db files are named after the database. The admin database is called local.

###IP Address and Port By default the mongo.conf says that mongo is available only on 127.0.0.1. To make it available on a different ip you have to specify it the config. For example you may add the servers external ip address if you want to make it visible to the outside. If you omit the bindIp entry then it will listen on all ip's.

storage:
    dbPath: "/data/db"
    journal:
        enabled: true

net:
    bindIp: address1,address2
    port: 27017

systemLog:     
    destination: file
    path: "/data/log/mongod.log"

###HTTP Monitoring You can activate http monitoring. This is useful in development but it is not advisable in production as it displays too much sensitive information. You can enable it by adding the following to the net configuration:

net: bindIp: address1,address2 port: 27017 http: enabled: true

Note: The http page will be visible on 28017 (ie the port + 1000).

You can then start mongod using the config file:

mongod --config /data/config/mongod.conf

##Mongo Shell To connect to the database you can type

mongo

This will connect to the local mongo instance. You can also specify the host (ie use hostname to get the host name) and the port (27017 is the default) using:

mongo --host myhostname --port 27017

###Server Status You can get the status of the server and its current configuration using:

db.serverStatus()

Or if you are just looking for a specific element (ie durability dur) then you can just do:

db.serverStatus().dur

##Shutdown the server

###Ubuntu 14.04 (using upstart)

service mongod stop

###Mongo Shell You can also shutdown from within the mongo shell. You just need to access the admin db to issue the command.

use admin
db.shutdownServer()

###Kill the process First get the pid of the process. Then kill it.

ps -ax | grep mongod
1234

kill 1234

This will do a clean shutdown as the mongod process will detect the kill signal and shutdown gracefully. Note: If you do a kill -9 1234 this can leave the system corrupted.

###Restarting the service Sometimes you will need to restart the service without the clients connecting. For example you may want to check the state after a kill 9 event to ensure that the database is usable. In this case the best thing is to start it manually but use a different port.

sudo mongod --port 40000 --dbpath /var/lib/mongodb

This will start the process in the terminal window and you can see the recovery process. It is recovers cleanly it will say it is waiting form connections in port 40000. You can use Ctrl-C to do a clean shutdown in the terminal window.

##Backing up You can backup the data using mongodump. You need to specify the host and port being used and where you want to save the data. If you don't specify the host, port and location it will assume the local mongo installation and the current folder for the location. If you don't specify a database it assumes all databases and it will create a folder for each database. You can specify a database using the --db option. Likewise, it will assume all collections in a database unless a specific collection is specified using --collection.

mongodump --host [...] --port [...] --db [...] --collection [...] --user [...] --password [...] --out [...]

For each collection in a database, mongodump will create a .bson file with the binary data, metadata.json with the description of the fields. It will also create a system.indexes.bson with the index information.

###Point in Time Backup When you are backing up a large database it take some time to write all the documents. During this time is is possible that changes can occur to some documents while the backup is in progress. There is a option that you can use called --oplog which tells mongodump to log all changes that occur while the backup is in progress. When the data is restored, it will load the data and then replay all the changes which occured during the backup. This will restore to a point in time at the end of the backup. This is particularly useful when the database is in constant production use.

Note: This only works with databases which are replica set.

This will create an oplog.bson file which contains the transaction log.

###Admin Database Mongo has a default database called admin which holds user premissions etc. It is important that this is backed up if you want to restore all your user settings.

###Backup Security You can designate specific users (ie backup) which can backup by assigning the user the backup role. Good practice is to create a specific operating system user who has permission to access the backup folders so that they are not generally accessable.

sudo adduser mongobackup
sudo mkdir /backup
sudo chown -R mongobackup: /backup
sudo chmod -R 700 /backup

##Restoring Data You can restore using the mongorestore command.

mongorestore /backup/dump

This will restore all the data in the backup folder to the local mongo instance. To restore to a specific server you can use:

mongorestore --host myserver --port 27071 --user mybackupuser --password mysecurepassword /backup/dump

When you restore a dump it will add the documents back into the database. If the document already exists then it will throw and error. This is useful where you want to restore data which has been removed (ie archived) from the database. If you want to do a complete restore you need to use the --drop option. This will drop the existing documents and do a complete restore.

mongorestore --host myserver --port 27071 --username mybackupuser --password mysecurepassword --drop /backup/dump
mongorestore --host myserver --port 27071 -u mybackupuser -p mysecurepassword --drop /backup/dump

You can also restore just a collection specifying the exact .bson file using:

mongorestore --host myserver --port 27071 -u mybackupuser -p mysecurepassword --collection mycollection /backup/dump/mycollection.bson

I can also use --drop with the collection restore to just drop the collection.

###Point in Time Restore To restore a point in time backup you just need to specify the --oplogReplay when you are doing the restore. This will then restore the dump files and replay and changes captured in the oplog during the backup process.

mongorestore --oplogRestore --drop --host myserver --port 27071 -u mybackupuser -p mysecurepassword /backup/dump

###Bulk Data Import There are times when you may need to integrate Mongo with external systems. To assist with this the mongo tools come with mongoimport and mongoexport. These tools allow you to import and export data in json (JavaScript object notation), csv (comma seperated values) and tsv (tab seperated values) formats. The default format is json.

mongoimport --type [json|csv|tsv]

To import a json file:

mongoimport --db mydb --collection mycollection mydata.json

####Upsert To update an existing you can use the --upsert option. In this case it will insert the document if it is not there already or update it if it is there.

mongoimport --upsert --db mydb --collection mycollection mydata.json

By default, it will match the imported document on the _id field. It is also possible to use a different field (ie productCode, accountNo etc). This is expressed as a comma seperated list of fields.

mongoimport --upsert --upsertFields matchField1,matchField2 --db mydb --collection mycollection mydata.json

#####CSV Import We can tell mongo that the first line has the field names by using the --headerline option.

mongoimport --type csv --headerline --db mydb --collection mycollection mydata.csv

If the csv file does not have a headerline with the field names you can specify the fields directly with the --fields option. Note: Do not put any spaces between the fields.

mongoimport --type csv --fields field1,field2,field3 --db mydb --collection mycollection mydata.csv

If there are a lot of field names or if you prefer, you can specify the fields in a seperate file (one field per line) and specify the file using the --fieldFile option. So the field file would look like:

field1
field2
field3

You can then import using:

mongoimport --type csv --fieldFile myFieldFile.txt --db mydb --collection mycollection mydata.csv

###Bulk Data Export You can export data using the mongoexport command. As before it assumes you want to export all databases and collections in json unless you specify the --db and --collection.

mongoexport --db mydatabase --collection mycollection

The output will show on the StdOut. You can redirect this in Linix using the >.

mongoexport --db mydatabase --collection mycollection > out.json

You can also use the --out command to specify a file.

mongoexport --db mydatabase --collection mycollection --out out.json

In addition, you can specify the fields using --fields or the --fieldFile if you prefer.

mongoexport --db mydatabase --collection mycollection --fields field1 --out out.json

This will export the data in json format. With json, it will always give the specified fields and the _id field. When you export into other formats (csv, tsv), the _id is not automatically exported.

mongoexport --db mydatabase --collection mycollection --fields field1 --type=csv --out out.json

####Selective Export Sometimes you may want to just export documents which match a criteria. You can do this using the --query option.

mongoexport --db mydatabase --collection mycollection --query "{_id:{gt:2}}"

You can also use the normal query options to limit the export:

mongoexport --db mydatabase --collection mycollection --skip 0 --limit: 2 --sort "{_id:1}"

##Indexes There are 5 different index types supported in MongoDB.

  1. Regular (B-Tree) - supports indexes on a single or multiple fields
  2. Geo - supports indexing on a geographical location (useful for finding locations nearby)
  3. Text - Allows for search engine like full text searching
  4. Hashed - This is used primarily used for sharding as it evenly distributes the hashes rather than clustering them.
  5. Time to Life - This supports expiring based on a document date field

You can see the queries for a collection using:

db.collection.getIndexes()

or, you can get all the indexes for all the collections using:

db.system.indexes.find()

###Creating an Index You can create an index using:

db.collectionName.ensureIndex({field: direction}})  //-1 for descending and 1 for ascending direction

db.myCollection.ensureIndex({myField: 1}})

By default, the index is created in the foreground. This requires a lock on the collection which means that documents cannot be added or changed while the index is building. You can opt to create the index as a background procedure by using the background option.

db.myCollection.ensureIndex({myField: 1},{background: true}})

Creating an index in the background can be much slower than in the foreground.

###Removing an Index You can remove an index using:

db.myCollection.dropIndex('myIndexName')

###Checking that a query uses an index By adding .explain() to the end of a query, it will tell you the query execution strategy:

db.myCollection.find({country: 'Ireland'}).explain()

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "mydb.myCollection",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"country" : {
				"$eq" : "Ireland"
			}
		},
		"winningPlan" : {
			"stage" : "FETCH",
			"inputStage" : {
				"stage" : "IXSCAN",
				"keyPattern" : {
					"country" : 1
				},
				"indexName" : "country_1",
				"isMultiKey" : false,
				"direction" : "forward",
				"indexBounds" : {
					"country" : [
						"[\"Ireland\", \"Ireland\"]"
					]
				}
			}
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "127.0.0.1",
		"port" : 27017,
		"version" : "3.0.7",
		"gitVersion" : "6ce7cbe8c5b899552dadd907604559806aa2e9bd"
	},
	"ok" : 1
}

From this you can see that the winningPlan uses an IXSCAN and the indexName is shown as country_1. If you try to run a query against a field which is not indexed such as:

db.myCollection.find({address: 'Ireland'}).explain()

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "mydb.myCollection",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"address" : {
				"$eq" : "Ireland"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"address" : {
					"$eq" : "Ireland"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"serverInfo" : {
		"host" : "127.0.0.1",
		"port" : 27017,
		"version" : "3.0.7",
		"gitVersion" : "6ce7cbe8c6b899652dadd907604559806aa2e9bd"
	},
	"ok" : 1
}

From this you can see that the winningPlan was a COLSCAN. This means that it had to scan through the whole collection. This is is slow.

###Execution Statistics You can add a parameter executionStats to the .explain() and it will return the details of the execution.

db.myCollection.find({address: 'Ireland'}).explain('executionStats')

{
	"queryPlanner" : {
		"plannerVersion" : 1,
		"namespace" : "mydb.myCollection",
		"indexFilterSet" : false,
		"parsedQuery" : {
			"address" : {
				"$eq" : "Ireland"
			}
		},
		"winningPlan" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"address" : {
					"$eq" : "Ireland"
				}
			},
			"direction" : "forward"
		},
		"rejectedPlans" : [ ]
	},
	"executionStats" : {
		"executionSuccess" : true,
		"nReturned" : 0,
		"executionTimeMillis" : 23161,
		"totalKeysExamined" : 0,
		"totalDocsExamined" : 302762,
		"executionStages" : {
			"stage" : "COLLSCAN",
			"filter" : {
				"address" : {
					"$eq" : "Ireland"
				}
			},
			"nReturned" : 0,
			"executionTimeMillisEstimate" : 4883,
			"works" : 307355,
			"advanced" : 0,
			"needTime" : 302763,
			"needFetch" : 4591,
			"saveState" : 5750,
			"restoreState" : 5750,
			"isEOF" : 1,
			"invalidates" : 37,
			"direction" : "forward",
			"docsExamined" : 302762
		}
	},
	"serverInfo" : {
		"host" : "127.0.0.1",
		"port" : 27017,
		"version" : "3.0.7",
		"gitVersion" : "6ce7cbe8c4b899552dadd907604559806aa2e9bd"
	},
	"ok" : 1
}

The ideal result is that the totalDocsExamined is the same as the nReturned. You don't always get that but it indicates that there is no index being used.

##Forcing Query to use an Index Normally mongo will use an index to evaluate a query automatically but you can force mongo to use an index using the .hint(indexName) function.

db.myCollection.find(country: 'Ireland').hint('country_1')

##Sparse Index A sparse index will only create an entry for documents that have the field. This can reduce the size of the index significantly if the field is only present in some of the documents. Smaller index means faster index.

db.myCollection.ensureIndex({"myField":1},{sparse:true})

Note: If you query documents by info.type it will skip documents which do not have the field as it will use the index. Use .explain() to see what it is doing.

##Unique Indexes You can create a unique index using:

db.users.ensureIndex({email:1},{unique:true})

If a user tries to add a new user with an email that is already in the database this will throw an error. If you need to be able to add users initially without and email and then assign an email later then you need to make the index unique and sparse.

db.users.ensureIndex({email:1},{unique:true, sparse:true})

Note: Unique Indexes work fine with arrays too. So if you try to add an item to the array which is part of a unique index then it will throw and error if the item is already used in another document.

###TTL (Time to Live) Indexes Sometimes it is useful to remove documents after a certain period of time has passed. An example of this would be session documents. You can create a TTL index on a single date-time field, you just need to specify the expireAfterSeconds so that it knows when the document is expired. You can only have one TTL index on a collection.

db.sessions.ensureIndex({createdAt: 1},{expireAfterSeconds: 60*60*4})

This means that documents added to the session collection will live for 60 seconds * 60 minutes * 4 hours (ie 4 hours).

###Reindexing You can recreate all the indexes for a collection by using the reIndex() function.

db.myCollection.reIndex()

This will drop and recreate all the indexes on a collection.

###Compacting You can compact which defragments the documents and rebuilds the indexes in the collection using:

db.myCollection.runCommand({compact: 'myCollection'})

Compacting is a blocking operation so it is best to do this during a maintenance window.

##Replication In production, it is best to run multiple servers as part of a replica set. This means that if a server goes down as a result of failure or as part of a planned maintenance then the database remains available.

The most basic replica set is a primary server, a secondary server and an arbitrator.

mongod --dbpath /data --replSet r1 --oplogSize 1

Please note, normally you do not specify the oplogSize and by default it is 5% of the disk size or 50Gb but you can specify a specific size if you prefer but you should ensure that it is big enough.

mongo
rs.initiate()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment