Docs Menu
Docs Home
/
MongoDB Manual
/ / / /

Back Up a Self-Managed Sharded Cluster with File System Snapshots

On this page

  • Overview
  • Considerations
  • Before You Begin
  • Steps

This document describes a procedure for taking a backup of all components of a sharded cluster. This procedure uses file system snapshots to capture a copy of the mongod instance.

Important

To back up a sharded cluster you must stop all writes to the cluster.

For more information on backups in MongoDB and backups of sharded clusters in particular, see Backup Methods for a Self-Managed Deployment and Backup and Restore a Self-Managed Sharded Cluster.

To take a backup with a file system snapshot, you must first stop the balancer, stop writes, and stop any schema transformation operations on the cluster.

MongoDB provides backup and restore operations that can run with the balancer and running transactions through the following services:

  • MongoDB Atlas

  • MongoDB Cloud Manager

  • MongoDB Ops Manager

For encrypted storage engines that use AES256-GCM encryption mode, AES256-GCM requires that every process use a unique counter block value with the key.

For encrypted storage engine configured with AES256-GCM cipher:

  • Restoring from Hot Backup
    Starting in 4.2, if you restore from files taken via "hot" backup (i.e. the mongod is running), MongoDB can detect "dirty" keys on startup and automatically rollover the database key to avoid IV (Initialization Vector) reuse.
  • Restoring from Cold Backup

    However, if you restore from files taken via "cold" backup (i.e. the mongod is not running), MongoDB cannot detect "dirty" keys on startup, and reuse of IV voids confidentiality and integrity guarantees.

    Starting in 4.2, to avoid the reuse of the keys after restoring from a cold filesystem snapshot, MongoDB adds a new command-line option --eseDatabaseKeyRollover. When started with the --eseDatabaseKeyRollover option, the mongod instance rolls over the database keys configured with AES256-GCM cipher and exits.

It is essential that you stop the balancer before capturing a backup.

If the balancer is active while you capture backups, the backup artifacts may be incomplete or have duplicate data, as chunks may migrate while recording backups.

In this procedure, you will stop the cluster balancer and take a backup up of the config database, and then take backups of each shard in the cluster using a file-system snapshot tool. If you need an exact moment-in-time snapshot of the system, you will need to stop all writes before taking the file system snapshots; otherwise the snapshot will only approximate a moment in time.

To back up a sharded cluster, you must use the fsync command or db.fsyncLock() method to stop writes on the cluster. This helps reduce the likelihood of inconsistencies in the backup.

Note

These steps can only produce a consistent backup if they are followed exactly and no operations are in progress when you begin.

If your deployment depends on Amazon's Elastic Block Storage (EBS) with RAID configured within your instance, it is impossible to get a consistent state across all disks using the platform's snapshot tool. As an alternative, you can do one of the following:

This procedure requires a version of MongoDB that supports fsync locking from mongos.

Starting in MongoDB 7.1 (also available starting in 7.0.2, 6.0.11, and 5.0.22) the fsync and fsyncUnlock commands can run on mongos to lock and unlock a sharded cluster.

Starting in MongoDB 8.0, you can use the directShardOperations role to perform maintenance operations that require you to execute commands directly against a shard.

Warning

Running commands using the directShardOperations role can cause your cluster to stop working correctly and may cause data corruption. Only use the directShardOperations role for maintenance purposes or under the guidance of MongoDB support. Once you are done performing maintenance operations, stop using the directShardOperations role.

To take a self-managed backup of a sharded cluster, complete the following steps:

1

Chunk migrations, resharding, and schema migration operations can cause inconsistencies in backups. To find a good time to perform a backup, monitor your application and database usage and find a time when these operations are unlikely to occur.

For more information, see Schedule Backup Window for a Self-Managed Sharded Cluster.

2

To prevent chunk migrations from disrupting the backup, use the sh.stopBalancer() method to stop the balancer:

sh.stopBalancer()

If a balancing round is currently in progress, the operation waits for balancing to complete.

To verify that the balancer is stopped, use the sh.getBalancerState() method:

use config
while( sh.isBalancerRunning().mode != "off" ) {
print("waiting...");
sleep(1000);
}
3

Writes to the database can cause backup inconsistencies. Lock your sharded cluster to protect the database from writes.

To lock a sharded cluster, use the db.fsyncLock() method:

db.getSiblingDB("admin").fsyncLock()

Run the following aggregation pipeline on both mongos and the primary mongod of the config servers. To confirm the lock, ensure that the fysncLocked field returns true and fsyncUnlocked field returns false.

db.getSiblingDB("admin").aggregate( [
{ $currentOp: { } },
{ $facet: {
"locked": [
{ $match: { $and: [
{ fsyncLock: { $exists: true } }
] } }],
"unlocked": [
{ $match: { fsyncLock: { $exists: false } } }
]
} },
{ $project: {
"fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
"fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
} }
] )
[ { fsyncLocked: true }, { fsyncUnlocked: false } ]
4

Note

Backing up a config server backs up the sharded cluster's metadata. You only need to back up one config server, as they all hold the same data. Perform this step against the CSRS primary member.

To create a filesystem snapshot of the config server, follow the procedure in Create a Snapshot.

5

Perform a filesystem snapshot against the primary member of each shard, using the procedure found in Back Up and Restore a Self-Managed Deployment with Filesystem Snapshots.

6

After the backup completes, you must unlock the cluster to allow writes to resume.

To unlock the cluster, use the db.fsyncUnlock() method:

db.getSibling("admin").fsyncUnlock()

Run the following aggregation pipeline on both mongos and the primary mongod of the config servers. To confirm the unlock, ensure that the fysncLocked field returns false and fsyncUnlocked field returns true.

db.getSiblingDB("admin").aggregate( [
{ $currentOp: { } },
{ $facet: {
"locked": [
{ $match: { $and: [
{ fsyncLock: { $exists: true } }
] } }],
"unlocked": [
{ $match: { fsyncLock: { $exists: false } } }
]
} },
{ $project: {
"fsyncLocked": { $gt: [ { $size: "$locked" }, 0 ] },
"fsyncUnlocked": { $gt: [ { $size: "$unlocked" }, 0 ] }
} }
] )
[ { fsyncLocked: false }, { fsyncUnlocked: true } ]
7

To restart the balancer, use the sh.startBalancer() method:

sh.startBalancer()

To confirm that the balancer is running, use the sh.getBalancerState() method:

sh.getBalancerState()
true

The command returns true when the balancer is running.

Back

Restore Sharded Clusters