Rollbacks During Replica Set Failover
A rollback reverts write operations on a former primary when the member rejoins its replica set after a failover. A rollback is necessary only if the primary had accepted write operations that the secondaries had not successfully replicated before the primary stepped down. When the primary rejoins the set as a secondary, it reverts, or "rolls back," its write operations to maintain database consistency with the other members.
MongoDB attempts to avoid rollbacks, which should be rare. When a rollback does occur, it is often the result of a network partition. Secondaries that can not keep up with the throughput of operations on the former primary, increase the size and impact of the rollback.
A rollback does not occur if the write operations replicate to another member of the replica set before the primary steps down and if that member remains available and accessible to a majority of the replica set.
Collect Rollback Data
Configure Rollback Data
The createRollbackDataFiles
parameter
controls whether or not rollback files are created during rollbacks.
Rollback Data
By default, when a rollback occurs, MongoDB writes the rollback data to BSON files.
For each collection whose data is rolled back, the rollback files are located in
a <dbpath>/rollback/<collectionUUID>
directory and have filenames of the
form:
removed.<timestamp>.bson
For example, if data for the collection comments
in the reporting
database rolled back:
<dbpath>/rollback/20f74796-d5ea-42f5-8c95-f79b39bad190/removed.2020-02-19T04-57-11.0.bson
where <dbpath>
is the mongod
's dbPath
.
Tip
Collection Name
To get the collection name, you can search for rollback file
in the
MongoDB log. For example, if the log file is
/var/log/mongodb/mongod.log
, you can use grep
to search for instances
of "rollback file"
in the log:
grep "rollback file" /var/log/mongodb/mongod.log
Alternatively, you can loop through all the databases and run
db.getCollectionInfos()
for the specific UUID until you get a match.
For example:
var mydatabases=db.adminCommand("listDatabases").databases; var foundcollection=false; for (var i = 0; i < mydatabases.length; i++) { let mdb = db.getSiblingDB(mydatabases[i].name); collections = mdb.getCollectionInfos( { "info.uuid": UUID("20f74796-d5ea-42f5-8c95-f79b39bad190") } ); for (var j = 0; j < collections.length; j++) { // Array of 1 element foundcollection=true; print(mydatabases[i].name + '.' + collections[j].name); break; } if (foundcollection) { break; } }
Rollback Data Exclusion
If the operation to roll back is a collection drop or a document deletion, the rollback of the collection drop or document deletion is not written to the rollback data directory.
Warning
If write operations use { w: 1 }
write concern,
the rollback directory may exclude writes submitted after an
oplog hole if the primary restarts before the write operation
completes.
Read Rollback Data
To read the contents of the rollback files, use bsondump
.
Based on the content and the knowledge of their applications,
administrators can decide the next course of action to take.
Avoid Replica Set Rollbacks
For replica sets, the write concern
{ w: 1 }
only provides acknowledgment of write
operations on the primary. Data may be rolled back if the primary steps
down before the write operations have replicated to any of the
secondaries. This includes data written in multi-document
transactions that commit using
{ w: 1 }
write concern.
Journaling and Write Concern majority
To prevent rollbacks of data that have been acknowledged to the client, run all voting members with journaling enabled and use { w: "majority" } write concern to guarantee that the write operations propagate to a majority of the replica set nodes before returning with acknowledgment to the issuing client.
Starting in MongoDB 5.0, { w: "majority" }
is the default write concern
for most MongoDB deployments. See Implicit Default Write Concern.
With writeConcernMajorityJournalDefault
set to false
,
MongoDB does not wait for w: "majority"
writes to be written to the on-disk journal before acknowledging the
writes. As such, "majority"
write operations could
possibly roll back in the event of a transient loss (e.g. crash and
restart) of a majority of nodes in a given replica set.
Visibility of Data That Can Be Rolled Back
Regardless of a write's write concern, other clients using
"local"
or"available"
read concern can see the result of a write operation before the write operation is acknowledged to the issuing client.Clients using
"local"
or"available"
read concern can read data which may be subsequently rolled back during replica set failovers.
For operations in a multi-document transaction, when a transaction commits, all data changes made in the transaction are saved and visible outside the transaction. That is, a transaction will not commit some of its changes while rolling back others.
Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.
However, when a transaction writes to multiple shards, not all
outside read operations need to wait for the result of the committed
transaction to be visible across the shards. For example, if a
transaction is committed and write 1 is visible on shard A but write
2 is not yet visible on shard B, an outside read at read concern
"local"
can read the results of write 1 without
seeing write 2.
Rollback Considerations
User Operations
Starting in version 4.2, MongoDB kills all in-progress user
operations when a member enters the ROLLBACK
state.
Index Builds
For feature compatibility version (fcv)
"4.2"
, MongoDB waits for any in-progress index builds to finish before starting a rollback.
For more information on the index build process, see Index Builds on Populated Collections.
Index Operations When "majority"
Read Concern is Disabled
Disabling "majority"
read concern prevents
collMod
commands which modify an index from
rolling back. If such an operation needs
to be rolled back, you must resync the affected nodes with the
primary node.
Size Limitations
MongoDB supports the following rollback algorithms, which have different size limitations:
Recover to a Timestamp, where a former primary reverts to a consistent point in time and applies operations until it catches up to the sync source's branch of history. This is the default rollback algorithm.
When using this algorithm, MongoDB does not limit the amount of data you can roll back.
Rollback via Refetch, where a former primary finds the common point between its oplog and the sync source's oplog. Then, the member examines and reverts all operations in its oplog until it reaches this common point. Rollback via Refetch occurs only when the
enableMajorityReadConcern
setting in your configuration file is set tofalse
.When using this algorithm, MongoDB can only roll back up to 300 MB of data.
Note
Starting in MongoDB 5.0,
enableMajorityReadConcern
is set totrue
and cannot be changed.
Rollback Elapsed Time Limitations
The rollback time limit defaults to 24 hours and is configurable using
the rollbackTimeLimitSecs
parameter.
MongoDB measures elapsed time as the time between the first common operation in the oplogs to the last entry in the oplog of the member being rolled back.