Docs Menu
Docs Home
/
MongoDB Manual
/

Sharded Cluster Balancer

On this page

  • Cluster Balancer
  • Chunk Migration Procedure
  • Shard Size

The MongoDB balancer is a background process that monitors the number of chunks on each shard. When the number of chunks on a given shard reaches specific migration thresholds, the balancer attempts to automatically migrate chunks between shards and reach an equal number of chunks per shard.

The balancing procedure for sharded clusters is entirely transparent to the user and application layer, though there may be some performance impact while the procedure takes place.

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

The balancer runs on the primary of the config server replica set (CSRS).

The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. By default, the balancer process is always enabled.

To address uneven chunk distribution for a sharded collection, the balancer migrates chunks from shards with more chunks to shards with a fewer number of chunks. The balancer migrates the chunks until there is an even distribution of chunks for the collection across the shards. For details about chunk migration, see Chunk Migration Procedure.

Chunk migrations can have an impact on disk space, as the source shard automatically archives the migrated documents by default. For details, see moveChunk directory.

Chunk migrations carry some overhead in terms of bandwidth and workload, both of which can impact database performance. [1] The balancer attempts to minimize the impact by:

  • Restricting a shard to at most one migration at any given time; i.e. a shard cannot participate in multiple chunk migrations at the same time. To migrate multiple chunks from a shard, the balancer migrates the chunks one at a time.

    Changed in version 3.4: Starting in MongoDB 3.4, MongoDB can perform parallel chunk migrations. Observing the restriction that a shard can participate in at most one migration at a time, for a sharded cluster with n shards, MongoDB can perform at most n/2 (rounded down) simultaneous chunk migrations.

    See also Asynchronous Chunk Migration Cleanup.

  • Starting a balancing round only when the difference in the number of chunks between the shard with the greatest number of chunks for a sharded collection and the shard with the lowest number of chunks for that collection reaches the migration threshold.

You can disable the balancer temporarily for maintenance, but leaving the balancer disabled for extended periods of time can degrade cluster performance. For more information, see Disable the Balancer.

You can also limit the window during which the balancer runs to prevent it from impacting production traffic. See Schedule the Balancing Window for details.

Note

The specification of the balancing window is relative to the local time zone of the primary of the config server replica set.

Tip

See also:

[1] The shard collection operation can perform an initial chunk creation and distribution for empty or non-existing collections if zones and zone ranges have been defined for the collection. Initial creation and distribution of chunk allows for faster setup of zoned sharding. After the initial distribution, the balancer manages the chunk distribution going forward per usual.MongoDB supports sharding collections on compound hashed indexes. When sharding an empty or non-existing collection using a compound hashed shard key, additional requirements apply in order for MongoDB to perform initial chunk creation and distribution.See Pre-Define Zones and Zone Ranges for an Empty or Non-Existing Collection for an example.

Adding a shard to a cluster creates an imbalance, since the new shard has no chunks. While MongoDB begins migrating data to the new shard immediately, it can take some time before the cluster balances. See the Add Shards to a Cluster tutorial for instructions on adding a shard to a cluster.

Removing a shard from a cluster creates a similar imbalance, since chunks residing on that shard must be redistributed throughout the cluster. While MongoDB begins draining a removed shard immediately, it can take some time before the cluster balances. Do not shutdown the servers associated to the removed shard during this process.

When you remove a shard in a cluster with an uneven chunk distribution, the balancer first removes the chunks from the draining shard and then balances the remaining uneven chunk distribution.

See the Remove Shards from an Existing Sharded Cluster tutorial for instructions on safely removing a shard from a cluster.

All chunk migrations use the following procedure:

  1. The balancer process sends the moveChunk command to the source shard.

  2. The source starts the move with an internal moveChunk command. During the migration process, operations to the chunk route to the source shard. The source shard is responsible for incoming write operations for the chunk.

  3. The destination shard builds any indexes required by the source that do not exist on the destination.

  4. The destination shard begins requesting documents in the chunk and starts receiving copies of the data. See also Chunk Migration and Replication.

  5. After receiving the final document in the chunk, the destination shard starts a synchronization process to ensure that it has the changes to the migrated documents that occurred during the migration.

  6. When fully synchronized, the source shard connects to the config database and updates the cluster metadata with the new location for the chunk.

  7. After the source shard completes the update of the metadata, and once there are no open cursors on the chunk, the source shard deletes its copy of the documents.

    Note

    If the balancer needs to perform additional chunk migrations from the source shard, the balancer can start the next chunk migration without waiting for the current migration process to finish this deletion step. See Asynchronous Chunk Migration Cleanup.

    Tip

    See also:

The migration process ensures consistency and maximizes the availability of chunks during balancing.

Warning

Secondary Reads in a Sharded Cluster with Migrations Can Miss Documents

Long-running secondary reads in a sharded cluster can miss documents if migrations are occurring.

Before deleting a chunk during chunk migration, MongoDB waits for orphanCleanupDelaySecs, or for in-progress queries involving the chunk to complete on the shard primary, whichever is longer. Queries that were initially run on a node that was primary, but continue after the node has stepped down to a secondary, will be treated as if they were initially executed on a secondary. That is, the server only waits for orphanDelayCleanupSecs if there are no queries targeting the chunk on the current primary.

Queries that target the chunk and are run on secondaries may miss documents if these queries take longer than orphanCleanupDelaySecs.

To minimize the impact of balancing on the cluster, the balancer begins balancing only after the distribution of data for a sharded collection has reached the migration threshold that makes the cluster unbalanced. When the number of chunks on the most loaded shard exceeds the optimal number of chunks per shard by more than 1 chunk, the collection is unbalanced and the balancer initiates chunk migrations. The optimal number of chunks per shard is the total number of chunks in a sharded collection divided by the number of shards, rounded up to the nearest integer. If zones exist, MongoDB calculates the optimal number of chunks on a per-zone basis.

For example, if a user adds a new shard to a collection of 10 shards with 20 chunks each, the balancer does not migrate any data. The optimal number of chunks on each shard is 200 divided by 11, or 18.18, which MongoDB rounds up to 19. Because the difference between 19 and 20 is 1, the cluster is balanced and the balancer does not migrate any chunks to the new shard.

To migrate multiple chunks from a shard, the balancer migrates the chunks one at a time. However, the balancer does not wait for the current migration's delete phase to complete before starting the next chunk migration. See Chunk Migration for the chunk migration process and the delete phase.

This queuing behavior allows shards to unload chunks more quickly in cases of heavily imbalanced cluster, such as when performing initial data loads without pre-splitting and when adding new shards.

This behavior also affects the moveChunk command, and migration scripts that use the moveChunk command may proceed more quickly.

In some cases, the delete phases may persist longer. Starting in MongoDB 4.4, chunk migrations are enhanced to be more resilient in the event of a failover during the delete phase. Orphaned documents are cleaned up even if a replica set's primary crashes or restarts during this phase.

The _waitForDelete, available as a setting for the balancer as well as the moveChunk command, can alter the behavior so that the delete phase of the current migration blocks the start of the next chunk migration. The _waitForDelete is generally for internal testing purposes. For more information, see Wait for Delete.

Note

Range deletion is a resource intensive operation that can result in significant cache and I/O stress as the cluster deletes the documents.

In cases where you plan to move a large amount of data, such as when adding shards to a cluster or during the initial distribution of a sharded collection across multiple shards, consider resharding the collection instead. Resharding operations don't require range cleanup, which makes them much less stressful on the cluster.

For more information, see Reshard a Collection.

Changed in version 3.4.

During chunk migration, the _secondaryThrottle value determines when the migration proceeds with next document in the chunk.

In the config.settings collection:

  • If the _secondaryThrottle setting for the balancer is set to a write concern, each document moved during range migration must receive the requested acknowledgment before proceeding with the next document.

  • If the _secondaryThrottle setting is unset, the migration process does not wait for replication to a secondary and instead continues with the next document.

To update the _secondaryThrottle parameter for the balancer, see Secondary Throttle for an example.

Independent of any _secondaryThrottle setting, certain phases of the chunk migration have the following replication policy:

  • MongoDB briefly pauses all application reads and writes to the collection being migrated, on the source shard, before updating the config servers with the new location for the chunk, and resumes the application reads and writes after the update. The chunk move requires all writes to be acknowledged by majority of the members of the replica set both before and after committing the chunk move to config servers.

  • When an outgoing chunk migration finishes and cleanup occurs, all writes must be replicated to a majority of servers before further cleanup (from other outgoing migrations) or new incoming migrations can proceed.

To update the _secondaryThrottle setting in the config.settings collection, see Secondary Throttle for an example.

By default, MongoDB cannot move a chunk if the number of documents in the chunk is greater than 1.3 times the result of dividing the configured chunk size by the average document size. db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection.

For chunks that are too large to migrate, starting in MongoDB 4.4:

  • A new balancer setting attemptToBalanceJumboChunks allows the balancer to migrate chunks too large to move as long as the chunks are not labeled jumbo. See Balance Chunks that Exceed Size Limit for details.

  • The moveChunk command can specify a new option forceJumbo to allow for the migration of chunks that are too large to move. The chunks may or may not be labeled jumbo.

You can tune the performance impact of range deletions with rangeDeleterBatchSize and rangeDeleterBatchDelayMS parameters. For example:

  • To limit the number of documents deleted per batch, you can set rangeDeleterBatchSize to a small value such as 32.

  • To add an additional delay between batch deletions, you can set rangeDeleterBatchDelayMS above the current default of 20 milliseconds.

Note

If there are ongoing read operations or open cursors on the collection targeted for deletes, range deletion processes may not proceed.

By default, MongoDB attempts to fill all available disk space with data on every shard as the data set grows. To ensure that the cluster always has the capacity to handle data growth, monitor disk usage as well as other performance metrics.

See the Change the Maximum Storage Size for a Given Shard tutorial for instructions on setting the maximum size for a shard.

Back

Modify Range Size