Skip to content

Instantly share code, notes, and snippets.

@maikelsperandio
Created November 21, 2019 19:07
Show Gist options
  • Save maikelsperandio/44ca52ce74618a7b799ebc69d6720aaa to your computer and use it in GitHub Desktop.
Save maikelsperandio/44ca52ce74618a7b799ebc69d6720aaa to your computer and use it in GitHub Desktop.
Some concepts and annotations about the Replica Set process in MongoDB.
MongoDB Replica Set
Replica sets or groups of mongods that share copies of the same information between them.
Replica set members can have one of two different roles.
The either can be primary node where all reads and all writes are served by this node.
Or secondary node where the responsibility of this node is to replicate all of the information, and then serve as a high availability to node in case of failure of the primary.
The secondaries will get the data from the primary through an asynchronous replication mechanism.
Every time an application writes some data to the replica set, that right is handled by the primary node.
And then data gets replicated to the secondary nodes.
Now this replication mechanism is based out of a protocol that manages the way that the secondaries should read data from the primary.
In MongoDB this is synchronous replication protocol might have different versions.
PV1 and PV0
The different versions of the replication protocol will vary slightly on the way durability and availability will be forced throughout the set.
Currently Protocol Version 1, or PV1, is the default version.
This protocol is based out of the RAFT protocol.
Read more about the Simple Raft Protocol (http://thesecretlivesofdata.com/raft/) and the Raft Consensus Algorithm (https://raft.github.io/).
At the heart of this replication mechanism there's our operations log, or oplog for short.
The oplog is a segment based lock that keeps track of all write operations acknowledged by the replica sets.
Every time a write is successfully applied to the primary node, it will get recorded in the oplog in its idempotent form.
Arbiter
Apart from a primary or secondary role, a replica set member can also be configured as an arbiter.
An arbiter is a member that holds no data.
It's mere existence is to serve as a tiebreaker between secondaries in an election.
And obviously if it has no data, it can never become primary.
Replica sets are failure resilient
That means that they have a failover mechanism that requires a majority of nodes in a replica set to be available for a primary to be elected.
In this particular case, let's assume that we lose access to our primary.
If we don't have a primary we will not be able to write, and that's not good.
So we need to clear between the remaining nodes of the set, which one could become the new primary?
That is through an election, which is embedded on the details of the political version.
Important thing to note is that you should always have at least an odd number of nodes in your replica set.
In case of even number of nodes, do make sure that the majority is consistently available.
In this form of a replica set, you will need to have at least three nodes to be available.
The list the replica set members in their configuration options defines the replica set topology.
Any topology change will trigger an election.
Adding members to the set, failing members, or changing any of the replica set configuration aspects will be perceived as it's topology change.
The topology of a replica set is defined in the replica set configuration.
The replica set configuration is defined in one of the nodes and then shared between all members through the replication mechanism.
Only a maximum of seven members can be voting members.
More than seven members may cause election rounds to take too much time, with little to none benefit for availability and consistency purposes.
So between those seven nodes, one of them will become the primary and the remaining ones will be electable as primaries if in case its policy changes, or in case a new election gets triggered.
Now if for some reason we can't or don't want to have a data bearing node, but still be able to failover between nodes, we can add a replica set member as an arbiter.
That said, arbiters do cause significant consistency issues in distributed data systems.
It is advise you use them with care.
The usage of arbiters is a very sensitive and potentially harmful option in many deployments.
Delayed nodes (Hidden nodes)
The purpose of a hidden node is to provide specific read-only workloads, or have copies over your data which are hidden from the application.
Hidden nodes can also be set with a delay in their replication process.
We call these delayed nodes.
The purpose of having delayed nodes is to allow resilience to application level corruption, without relying on cold backup files to recover from such an event.
If we have a node delayed, let's say one hour, and if your DBA accidentally drops a collection, we have one hour to recover all the data from the delayed node without needing to go back to back up file to recover to whatever the time that backup was created.
Enabling us to have hot backups.
RECAP:
Replica sets are groups of mongod processes that share the same data between all the members of the set.
They provide a high availability and failover mechanism to our application, making the service in case of failure.
The failover is supported by a majority of nodes that elect between them who should be the new primary node at each point in time.
Replica sets are a dynamic system, means that members may have different roles at different times, and can be set to address specific functional purpose like addressing read on workloads, or set to be delayed in time to allow hot back-ups.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment