Replica set with 3 DB Nodes and 1 Arbiter

Stennie_X · June 21, 2020, 1:59am

Welcome to the community @Kasun_Magedaragama!

As @michael_hoeller mentioned, an odd number of voting members is recommended. The addition of an arbiter to your 3 member deployment adds operational risk without providing any benefit.

Primaries are elected (and sustained) based on a consensus vote from a strict majority (>50%) of configured voting members. The strict majority requirement is to avoid situations that might otherwise allow more than one primary (for example, a network partition separating voting members equally between data centres).

If you add additional members to a replica set, there should be some motivation such as increasing data redundancy or improving fault tolerance. Adding an arbiter to a 3 member replica set does not contribute to either of those aspects.

With 3 voting members, the strict majority required to elect a primary is 2 votes which means there is a fault tolerance of 1 member that can be unavailable. If a majority of voting members aren’t available, a primary cannot be elected (or sustained) and all data-bearing members will transition to SECONDARY state.

What are the consequences of adding an arbiter?

With 4 voting members, the strict majority required to elect a primary is 3 votes which means there is still a fault tolerance of 1 despite the added member. There is also no improvement in data redundancy, since an arbiter only participates in elections.

However, if the 4th member is an arbiter this introduces some potential operational complications when the replica set is running in a degraded state (elected primary with one data-bearing member unavailable):

An arbiter contributes to the voting majority for a replica set election but cannot contribute to acknowledgement of write operations (since an arbiter doesn’t write any data).
If you want to avoid potential rollback of replica set writes, a majority write concern is recommended. However, a majority write concern cannot be acknowledged if your replica set currently only has a voting majority (using an arbiter) rather than a write majority. Operations with majority write concern will either block indefinitely (default behaviour) or time out (if you have specified the wtimeout option).
Cache pressure will be increased because more data will be pinned in cache waiting for the majority commit point to advance. Depending on your workload, this can cause significant problems if your replica set is in degraded state. There is a startup warning for Primary-Secondary-Arbiter (PSA) deployments which also would apply to your PSSA scenario in MongoDB 4.4 and earlier: Disable Read Concern Majority. For MongoDB 5.0+, please see Mitigate Performance Issues with PSA Replica Set.

Should you add an arbiter?

Typically, no.

An arbiter can still be useful if you understand the operational caveats and are willing to compromise robustness for cost savings.

Considering an arbiter as a tie-breaker is the best possible interpretation, but adding an arbiter does not have the same benefits as a secondary.

Where possible I would strongly favour using a data-bearing secondary over an arbiter for a more robust production deployment.

How is voting majority determined?

The required voting majority is based on the configured number of replica set members, not on the number that are currently healthy. The voting majority for a replica set with 4 voting members is always 3 votes.

Think of replication as analogous to RAID storage: your configuration determines the level of data redundancy, performance, and fault tolerance. If there is an issue with availability of one (or more) of your replica set members, the replica set will run in a degraded mode which allows continued write availability (assuming you still have a quorum of healthy voting members to sustain a primary) and read availability (as long as at least one data-bearing member is online).

What about even number of voting members?

Explanations around replica set configuration are often reductive to try to provide more straightforward understanding.

The election algorithm can handle an even number of voting members (for example, this is the scenario when you have a 3 member replica set with 1 member down). There are other factors that influence elections including member priority and freshness, so repeated tie-breaking scenarios should not be a concern. However, you generally want to remove any potential speed bumps for your normal deployment state (“all members healthy”). A configuration with an odd-number of voting members is also easier to rationalise when considering scenarios for data redundancy and fault tolerance.

MongoDB’s historical v0 election protocol (default in MongoDB 3.0 and earlier) only supported one election at a time, so any potential ties had a more significant impact on elapsed time to reach consensus. The modern v1 election protocol (default in MongoDB 3.2+) supports multiple concurrent elections for faster consensus. If you want to learn more, there’s a relevant talk from MongoDB World 2015: Distributed Consensus in MongoDB.

How is majority calculated?

That is the correct determination for voting majority, but write majority is based on data-bearing members. Ideally those calculations should be the same for a deployment, but arbiters and members with special configuration (eg delayed secondaries) will have consequences for write acknowledgements.

In MongoDB 4.2.1+, the rs.status() output now has explicit majorityVoteCount and writeMajorityCount calculations to remove any uncertainty.

Regards,
Stennie