Fault tolerance concepts & analysis
In this article, we are going to dive in and talk about concepts important in understanding fault tolerance in databases. We will do an in-depth analysis too about fault tolerance in different sized clusters towards the end.
Introduction to the problem.
Databases deployed inside Kubernetes are usually supported by a cluster of pods in the same network with each pod constantly communicating with the others.
Popular databases such as Postgres and MongoDB usually have one of the pods serving the write operations while all the other pods serve the read operations.
In such a setup, the pods serving the write operation are called the Master. While the rest are called the Followers.
Such a setup raises some questions about the Database.
These are questions directed towards Reliability. Like -
What happens if the Master goes down?
Do the write operations cease to be processed?
Are read operations affected?
What if the Master comes back online after a minute?
What if the Master remains down for a long time?
The solution to the problem: Elections
The solution which has been implemented in many of the databases is — The followers will communicate among themselves and choose a fellow Follower as the new Master.
The followers do this by holding an election. Each follower gets one vote. It can either vote for itself or vote for a fellow follower.
The follower who can get the majority of the votes — wins. And is thus, elected as the new master.
What’s a majority?
If there are N voting members. Then majority constitutes greater than N/2.
For N=2, the Majority is 2.
For N=3, the Majority is 2.
For N=4, the Majority is 3.
For N=5, the Majority is 3.
For N=6, the Majority is 4.
For N=7, the Majority is 4.
Failover Scenarios
Consider you have a cluster of four nodes where each node can vote. Let’s refer to these nodes as A, B, C, D where node A is the master initially.
Let’s say node A goes down. Now the remaining nodes must elect a new, As B, But there is a catch.
The voting can end up with two types of scenarios.
A 3–0 score where B, C & D all vote for one node. OR
A 2–1 score where B & C voting for one node and D votes for some other node.
In the case of the 3–0 score, elections will be conclusive since one node got a clear majority of 3 votes. BUT
In the case of a 2–1 score, elections will be non-conclusive since none of the nodes could get a clear majority of 3 votes.
The 2–1 ratio case is known as the Split Brain Scenario.
In such a situation, the cluster ends up having no primary nodes i.e. no node can handle write requests.
Tackling the split-brain scenario
The most obvious way to tackle this is to retrigger the elections. And keep on holding the elections until a conclusive result arrives.
However, this introduces new challenges.
The problem with repeating elections is that any applications accessing the data from the cluster will have to pause all activity and wait until a primary is elected.
Though chances of elections being inconclusive again and again are low, it still can become problematic for crucial use cases such as banking, military, etc.
Active-Active architecture
In active-active architecture, the same database is deployed to another data cluster as well and the two databases are kept in the same data state through continuous replication.
In case one data center’s cluster goes down or becomes unresponsive, the other data center’s cluster can serve the request. Therefore, minimizing downtime.
Active-Active is one way to work around repeated elections taking place.
Fault tolerance
Other issues can affect the availability & fault tolerance of a cluster such as multiple nodes going down at once.
Let’s discuss.
Handing multiple failover scenarios
We considered the N=4 case above, the cluster had a 50% chance of recovering after one node went down due to the two scenarios.
What happens if another node goes down?
Will the cluster be able to recover now?
What are the chances?
Let’s see.
- N=4. Majority= 3. Nodes: A,B,C,D. Initial primary: A.
- Now, node A goes down. Elections take place.
- B, C, D vote. Possible scores: 3–0, 2–1.
- Since we want to explore a second node going down. Let’s assume the score of the elections was 3–0 i.e. conclusive with node B becoming the new primary.
- Nodes: B, C, D. New primary: B. Majority required for elections is still 3.
- Now, node B goes down. Elections take place.
- C, D vote. Possible score: 2–0,1–1. All the scenarios are inconclusive i.e. cluster will remain down.
So, a cluster with N=4 can only survive one node going down.
Can we improve it?
Arbiters
Arbiters are instances that are part of the cluster but do not hold any data. They have minimal resource requirements and do not require dedicated hardware.
Arbiters are commonly used in MongoDB clusters.
Usage of arbiters
- They increase resilience against multiple node failures.
- They vote whenever a cluster faces a split-brain scenario. Thereby, they help in making elections conclusive.
Note: The arbiter only participates in elections and does guarantee that the elections will become conclusive with its participation.
Arbiters in action against node failures
Let’s see what benefit arbiters can give us.
Let’s reconsider the four-node cluster — A, B, C, D and add an arbiter to it. We will denote the arbiter by X. So we have — A, B, C, D, X i.e. five voters for the cluster now. Also, assume A is the primary initially.
For five voting nodes, the majority is 3. Now a node must secure 3 votes to become a primary.
First node failure
- N=4. Majority: 3. Voting nodes: A, B, C, D, X. Initial primary: A.
- Now, node A goes down. Elections take place.
- B,C,D,X vote. Possible scenarios: 4–0, 3–1, 2–2.
- Since we want to explore a second node going down. Let’s assume the score of the elections was 3–1 i.e. conclusive with node B becoming the new primary.
- Nodes: B, C, D, X. New primary: B. Majority required for elections is still 3.
Second node failure
- Now, node B goes down. Elections take place.
- C, D, X vote. Possible score: 3–0, 2–1. So, in case the score comes out as 3–0, elections would be conclusive.
- A new primary would be chosen e.g. C and cluster will remain up.
So, in the above case — Arbiters helped increase the fault tolerance of the cluster.
Let’s discuss some more cases.
Case of N=2
Without arbiters.
N=2. Voting nodes: 2. Majority=2. Nodes= A, B.
Initial primary: A.
After the first node failure — only 1 node will remain and that node will not be able to get the required votes i.e. 2 votes to become primary.
So, the cluster will remain down.
With arbiters.
N=2. Voting nodes: 3. Majority=2. Nodes: A,B,X.
Initial primary: A.
After the first node failure — 2 nodes will remain and one of them can get the required votes i.e. 2 votes to become primary.
So, the cluster can survive.
Short note
Until now, we have concluded that arbiters help increase the fault tolerance of an additional node going down for clusters with N=2 & N=4.
The same is true for N=6 as well. Try it yourself!
Case of N=5
Without arbiters.
N=5. Voting nodes=5. Majority=3. Nodes: A,B,C,D,E.
Node A is primary initially.
First node down.
Nodes: B,C,D,E. The cluster can survive. New primary: B.
Second node down.
Nodes: C,D,E. The cluster can survive. New primary: C.
Third node down.
Nodes: D, E. The cluster cannot survive as it cannot get enough votes for a new primary node.
Without arbiters, a cluster with N=5 can survive up to 2 node failures.
With arbiters.
N=5. Voting nodes=6. Majority=4. Nodes: A,B,C,D,E,X.
First node down.
Nodes: B,C,D,E,X. The cluster can survive. New primary: B.
Second node down.
Nodes: C,D,E,X. The cluster can survive. New primary: C.
Third node down.
Nodes: D, E, X. The cluster cannot survive as it cannot get enough votes for a new primary node.
With arbiters, a cluster with N=5 can survive up to 2 node failures.
Conclusion for N=5
Even with arbiters, a cluster with N=5 can survive only up to 2 node failures.
Do the same for the N=3, N=6. Try it!
The table below summarizes the findings.
N: No. of nodes.
WA: No. of node failures cluster can tolerate without an arbiter.
A: No. of node failures cluster can tolerate with an arbiter.
C: Conclusion.
N=2, WA=0, W=1, C: Better fault tolerance with the arbiter.
N=3, WA=1, W=1, C: No improvement.
N=4, WA=1, W=2, C: Better fault tolerance with the arbiter.
N=5, WA=2, W=2, C: No improvement.
N=6, WA=2, W=3, C: Better fault tolerance with the arbiter.
The special case of N=7
For N=7 also, there is no improvement with the arbiter. The special thing about N=7 is that even with an arbiter, the cluster will only consider 7 voting members and not 8.
This is because MongoDB restricts the maximum no. of voting members to 7 to carry out elections faster.
In the case of N=7, one of the data nodes will have no voting rights. The arbiter will still have the voting rights.
Arbiters in split-brain scenarios
Consider N=5 with the arbiter. Majority=4.
Nodes: A,B,C,D,E,X. Initial primary: A.
When node A goes down. B, C, D, E will vote. X may or may not vote depending on the scenario.
Scenario 1: All four nodes vote for the same node. So the elections were conclusive and the arbiter need not vote.
Scenario 2: Three nodes vote for the same node, remaining node votes for some other node. So the elections were inconclusive and the arbiter will vote.
Note: Arbiters don’t need to vote for the node that got three votes already. It could still vote for the node which got a single vote making the tally 3:2 i.e. inconclusive.
Scenario 3: Two nodes vote for the same node, the remaining two nodes vote for some other node. So the elections were inconclusive and the arbiter will vote. But whatsoever happens, the tally will be 3:2 i.e. the elections will be inconclusive.
In this way, the arbiter can help avoid the split-brain scenarios in certain cases.
Conclusion We have discussed concepts related to fault tolerance such as node elections for failovers, active-active architecture to tackle repeated election scenarios, and explored arbiters deeply to see where they can help us.
If you wish to verify these concepts, simply create a cluster of MongoDB, Postgres, or Redis and start experimenting on it. Keep writing down whatever you find! The work is a bit tedious but will really strengthen your fundamentals and concepts.
I was able to come up with such an in-depth analysis by trying out everything on a real cluster deployed using helm on Kubernetes.
This was originally published on my LinkedIn account. Consider following me over there! Also, 👏🏻 clap 👏🏻 your hands (up to 50x) if you enjoyed this post & found it knowledgeable. It encourages me to keep writing and helps other people in finding it :)
Join FAUN: Website 💻|Podcast 🎙️|Twitter 🐦|Facebook 👥|Instagram 📷|Facebook Group 🗣️|Linkedin Group 💬| Slack 📱|Cloud Native News 📰|More.
If this post was helpful, please click the clap 👏 button below a few times to show your support for the author 👇