Data Partitioning with Chunks
MongoDB uses the shard key associated to the collection to partition the data into chunks owned by a specific shard. A chunk consists of a range of sharded data. A range can be a portion of the chunk or the whole chunk. The balancer migrates data between shards. Each chunk has inclusive lower and exclusive upper limits based on the shard key.
The smallest unit of data a chunk can represent is a single unique shard key value.
Initial Chunks
Populated Collection
The sharding operation creates one large initial chunk to cover all of the shard key values.
After the initial chunk creation, the balancer moves ranges off of the initial chunk when it needs to start balancing data.
Empty Collection
If you have zones and zone ranges defined for an empty or non-existing collection.
The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.
After the initial distribution, the balancer manages the chunk distribution going forward.
If you do not have zones and zone ranges defined for an empty or non-existing collection:
For hashed sharding:
The sharding operation creates empty chunks to cover the entire range of the shard key values and performs an initial chunk distribution. By default, the operation creates 2 chunks per shard and migrates across the cluster. You can use
numInitialChunks
option to specify a different number of initial chunks. This initial creation and distribution of chunks allows for faster setup of sharding.After the initial distribution, the balancer manages the chunk distribution going forward.
For ranged sharding:
The sharding operation creates a single empty chunk to cover the entire range of the shard key values.
After the initial chunk creation, the balancer migrates the initial chunk across the shards as appropriate as well as manages the chunk distribution going forward.
Range Size
The default range size in MongoDB is 128 megabytes. You can increase or reduce the chunk size. Consider the implications of changing the default chunk size:
Small ranges lead to a more even distribution of data at the expense of more frequent migrations. This creates expense at the query routing (
mongos
) layer.Large ranges lead to fewer migrations. This is more efficient both from the networking perspective and in terms of internal overhead at the query routing layer. But, these efficiencies come at the expense of a potentially uneven distribution of data.
Range size affects the Maximum Number of Documents Per Range to Migrate.
Range size affects the maximum collection size when sharding an existing collection. Post-sharding, data range size does not constrain collection size.
For many deployments, it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set.
Range Migration
MongoDB migrates data ranges in a sharded cluster to distribute the data of a sharded collection evenly among shards. Migrations may be either:
Manual. Only use manual migration in limited cases, such as to distribute data during bulk inserts. See Migrating Chunks Manually for more details.
Automatic. The balancer process automatically migrates data when there is an uneven distribution of a sharded collection's data across the shards. See Migration Thresholds for more details.
For more information on the sharded cluster balancer, see Sharded Cluster Balancer.
Balancing
The balancer is a background process that manages data migrations. If the difference in amount of data between the largest and smallest shard exceed the migration thresholds, the balancer begins migrating data across the cluster to ensure an even distribution.
You can manage certain aspects of the balancer. The balancer also respects any zones created as a part of configuring zones in a sharded cluster.
See Sharded Cluster Balancer for more information on the balancer.
Indivisible/Jumbo Chunks
In some cases, chunks can grow beyond the specified chunk size but cannot undergo a split. The most common scenario is when a chunk represents a single shard key value. Since the chunk cannot split, it continues to grow beyond the chunk size, becoming a jumbo chunk. These jumbo chunks can become a performance bottleneck as they continue to grow, especially if the shard key value occurs with high frequency.
Starting in MongoDB 5.0, you can reshard a collection by changing a document's shard key.
Starting in MongoDB 4.4, MongoDB provides the
refineCollectionShardKey
command. Refining a collection's
shard key allows for a more fine-grained data distribution and can
address situations where the existing key insufficient cardinality
leads to jumbo chunks.
For more information, see: