Create Ranges in a Sharded Cluster
In most situations a sharded cluster will create/split and distribute ranges automatically without user intervention. However, in a limited number of cases, MongoDB cannot create enough ranges or distribute data fast enough to support the required throughput.
For example, if you want to ingest a large volume of data into a cluster that is unbalanced, or where the ingestion of data will lead to data imbalance, such as with monotonically increasing or decreasing shard keys. Pre-splitting the ranges of an empty sharded collection can help with the throughput in these cases.
Alternatively, starting in MongoDB 4.0.3, by defining the zones and zone ranges before sharding an empty or a non-existing collection, the shard collection operation creates ranges for the defined zone ranges as well as any additional ranges to cover the entire range of the shard key values and performs an initial range distribution based on the zone ranges. For more information, see Empty Collection.
Warning
Only pre-split ranges for an empty collection. Manually splitting ranges for a populated collection can lead to unpredictable range ranges and sizes as well as inefficient or ineffective balancing behavior.
To split empty ranges manually, you can run the split
command:
Example
To create ranges for documents in the myapp.users
collection using the email
field as the shard key,
use the following operation in mongosh
:
for ( var x=97; x<97+26; x++ ){ for ( var y=97; y<97+26; y+=6 ) { var prefix = String.fromCharCode(x) + String.fromCharCode(y); db.adminCommand( { split: "myapp.users", middle: { email : prefix } } ); } }
This assumes a collection size of 100 million documents.
For information on the initial ranges created and distributed by the sharding command, see Empty Collection.
For information on the balancer and automatic distribution of ranges across shards, see Balancer Internals and Range Migration.
For information on manually migrating ranges, see Migrate Ranges in a Sharded Cluster.