Skip to content

Instantly share code, notes, and snippets.

@DGamer007
Last active October 2, 2023 13:34
Show Gist options
  • Save DGamer007/0864c6aeebf27e3821602d9dd5ca7375 to your computer and use it in GitHub Desktop.
Save DGamer007/0864c6aeebf27e3821602d9dd5ca7375 to your computer and use it in GitHub Desktop.
Explore MongoDB Replication and Sharding in-depth. Learn how to ensure data availability and scale your MongoDB database for optimal performance.

MongoDB Replication and Sharding

2 Major Problems with Database Servers

  • What if Server shuts down (Data Availability Problem)
  • What if the Server is Incapable of Handling Request or Data (Scaling Problem)

Data Availability Problem

In MongoDB, Servers are basically mongod Processes. Let's say we have only one Server and it shuts down. Now, If we had two servers running from the start, one as a Primary Server and another as a Backup Server then once the Primary Server fails, we can serve up the content from Backup Server. To keep both of them in-sync, We need to keep track of all CUD(Create-Update-Delete) operations performed on Primary Server and replicate them on Backup Server as well.

The concept of Replication in MongoDB provides a solution for this problem which is based on the Simple Approach explained above.

Replication in MongoDB

mongod is a means for initiating MongoDB Server on a machine.

ReplicaSet is nothing but a Group of mongod instances that maintain the same data set. In ReplicaSet, only the Primary Process/Server/Node has authority for CRUD operations. On the failure of Primary Server an eligible Secondary Server will hold an Election to elect itself as a Primary Server.

There is also an Arbiter Node in Replica Set, It is basically a node that doesn't hold data but it is part of the Primary Node Election Process.

  • We can use either Single Data Center or Multiple Data Centers to host our Replica Set Nodes. Now, if we were to use Single Data Center and it would go down then there won't be any replicaSet Node left as backup. But if we were to use Multiple Data Centers then we can avoid this little chaos.

Implementation Details of Replica Set in MongoDB (Basic)

Initialize MongoDB Servers with replSet option to form a Replica Set

# Primary Server
mongod --port=2717 --replSet="<ReplicaSetName>" --dbpath="<path>/primary"
# Secondary Server 1
mongod --port=2727 --replSet="<ReplicaSetName>" --dbpath="<path>/secondary1"
# Secondary Server 2
mongod --port=2737 --replSet="<ReplicaSetName>" --dbpath="<path>/secondary2"

Now, In our case we want PORT 2717 Server to be Primary. So, connect to that Server using mongosh (MongoDB Shell).

# Connect using mongosh
mongosh --port=2717

# Initiate ReplicaSet with Basic Configurations
rs.initiate({
    _id:"<ReplicaSetName/Whatever>",
    members:[
        {_id:0, host:"localhost:2717"},
        {_id:1, host:"localhost:2727"},
        {_id:2, host:"localhost:2737"}
    ]
});

#OR

# Initiate ReplicaSet without Configurations
rs.initiate();
# Now add members in it
rs.add({host:"localhost:2727"});
rs.add({host:"localhost:2737"});

# No need to add PORT 2717 as it has already been added while initiating Replica Set

If we were to connect to Any ReplicaSet member using GUI tool before its Initiation then It would give us an Error because GUI tools try to fetch Database lists and stuff; and Replica Set doesn't provide those details before its Initiation

The Server from which we'll Initiate the Replica Set will be the Primary Server initially but afterwards if it crashes and some other Server becomes Primary then It'll persist till next Election.

Now, Whenever we want to utilize this Replica Set we'll have to use all its server's addresses in connection URI...

In this case the connection URI would be

# With Protocol
mongodb://localhost:2717,localhost:2727,localhost:2737

# Without Protocol
localhost:2717,localhost:2727,localhost:2737

If we were to connect to a Single Instance from Replica Set then It won't be using Replication anymore; 'coz if that Instance fails then the Connection will break as well, No Backup Server will handle our requests even if it exists in the same Replica Set.

Direct Read operations are not allowed on Secondary/Backup Server Unless it has been explicitly configured to be read from. Write Operations will only be performed by Primary Server.

Each member of Replica Set can live on a different machine.

For some more Info about the Working of Replica Sets, Watch this video

Scaling Problem

We have 2 Options for Scaling...

  • Vertical Scaling

    To Vertically Scale our current Server, We add more resources to the same Server like changing the CPU, Adding more RAM, etc...

  • Horizontal Scaling

    For Horizontal Scaling we add a whole new Server and we'll distribute or Partition our Data across those servers.

    Sharding in MongoDB is a means for achieving Horizontal Scaling.

After version-4 of MongoDB we can create Shards using Replica Sets only. So With Sharding we get the best of both worlds. Whapah!

Sharded Cluster

Sharded Cluster consists of

  • Config Server
  • Query Router (mongos)
  • Shards

Config Server holds the MetaData

Query Router routes our queries among Client, Config Server and Shards

Shards hold the actual Data

Implementation Details of a Sharded Cluster in MongoDB (Basic)

Each Config Server ReplicaSet can have any number of mongod processes (upto 50) with following exceptions: No Arbiters and No Zero-Priority Members

Initialize MongoDB Config Servers with configsvr and replSet options to form a Replica Set of Config Servers

mongod --configsvr --port=1030 --replSet="<ReplicaSetName>" --dbpath="<path>/server1"

mongod --configsvr --port=1040 --replSet="<ReplicaSetName>" --dbpath="<path>/server2"

mongod --configsvr --port=1050 --replSet="<ReplicaSetName>" --dbpath="<path>/server3"

Connect to anyone of them using mongosh and Initiate Replica Set (As Explained in Replication)

# Connect using mongosh
mongosh --host="localhost:1030"

# Initiate ReplicaSet with Configurations
rs.initiate({
    _id:"<ReplicaSetName/Whatever>",
    configsvr:true,
    members:[
        {_id:0, host:"localhost:1030"},
        {_id:1, host:"localhost:1040"},
        {_id:2, host:"localhost:1050"}
    ]
})

# Config Server Replica Set - DONE

Now, we'll configure Shards which are responsible for storing actual data. We'll Initialize MongoDB Shards using shardsvr and replSet options to form a Replica Set of Shards. And then we'll Initiate that Replica Set by connecting it to mongosh.

# Initialize MongoDB Shards
mongod --shardsvr --port=1130 --dbpath="<path>/server1" --replSet="<ReplicaSetName>"

mongod --shardsvr --port=1140 --dbpath="<path>/server2" --replSet="<ReplicaSetName>"

mongod --shardsvr --port=1150 --dbpath="<path>/server3" --replSet="<ReplicaSetName>"

# Connect using mongosh
mongosh --host="localhost:1130"

# Initiate Replica Set
rs.initiate({
    _id:"<ReplicaSetName/Whatever>",
    members: [
        {_id: 0, host: "localhost:1130"},
        {_id: 0, host: "localhost:1140"},
        {_id: 0, host: "localhost:1150"}
    ]
})

# MongoDB Shards - DONE

Initialize a Query Router which is a mongos process.

# Syntax
mongos --port=<port> --configdb="<configServerReplSet>/<firstHost>,<secondHost>,..."

# For our Example
mongos --port=1210 --configdb="config-server-replica-set/localhost:1030,localhost:1040,localhost:1050"

In production we use Multiple Query Routers to avoid Single point of Failure

Now, Connect Shards and Query Router (mongos)

# Connect to 'mongos' using 'mongosh'
mongosh --host="localhost:1210"

# Add Shard Replica Set to Sharded Cluster
# Syntax
sh.addShard("<shardsReplSet>/<firstHost>,<secondHost>,...")

# For our Example
sh.addShard("shards-replica-set/localhost:1130,localhost:1140,localhost:1150")

# This will be repeated for all the Shard ReplicaSets we have configured

Enable Sharding on a Specific Database of Shards Replica Set

# Syntax
sh.enableSharding("<databaseName>")

# For our Example
sh.enableSharding("practice")

Shard a Collection on the Sharding Enabled Database

# Syntax
sh.shardCollection("<database>,<collection>",{
    "<shard_key_field>": 1/"hashed"
})

# 1 indicates Ranged Sharding
# 'hased' indicates Hashed Sharding

# For our Example
sh.shardCollection("practice,students",{
    "enroll": 1
})

Ranged Sharding: It partitions our Data based on Shard Key's Value. The more the values are closer to each other, the chances of them being in a same Chunk or Shard are more.

Hashed Sharding: It partitions our Data based on the Hash it generates from our Shard Key's Value. So the more the content is similar, The chances of them being in a same Chunk or Shard are more.

For some more Info about the Working of Sharding in MongoDB, Watch this video

For Multi-Device Replication and Sharding

Replication and Sharding can also be achieved over Network. Examples mentioned above are for Single Machine but If we want to Implement Replication or Sharding over network then MongoDB provides us with a configuration called bind_ip which will bind our mongod or mongos processes with Machine's Network so that they could be accessible by Other Devices over Network.

So, If I wanted to run a Standalone instance of mongod process and access it from another device using mongosh...

# Start mongod Server over Network
mongod --port=1000 --dbpath="<path>" --bind_ip="<device_ip>"

# Connect using mongosh
mongosh --port=1000 --host="<device_ip>"

# For Example
# Let's say My Device's IPv4 Address is 192.168.29.131

# Start mongod Server over Network
mongod --port=1000 --dbpath="<path>" --bind_ip="192.168.29.131"

# Connect using mongosh
mongosh --port=1000 --host="192.168.29.131"

Same approach can be used for Sharding and Replication mongod and mongos Instances but for them when we are using bind_ip to bind them with Device's Network; We can no longer use localhost or 127.0.0.1 (Loopback Address) in any of our Configurations.

Now that We can Access things over Network, We also have to deal with Over Network Threats to our MongoDB services. For that MongoDB Provides Authentication and Encryption Configurations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment