Shard Keys
On this page
The shard key is either an indexed field or indexed compound fields that determines the distribution of the collection's documents among the cluster's shards.
Specifically, MongoDB divides the span of shard key values (or hashed shard key values) into non-overlapping ranges of shard key values (or hashed shard key values). Each range is associated with a chunk, and MongoDB attempts to distribute chunks evenly among the shards in the cluster.
The shard key has a direct relationship to the effectiveness of chunk distribution. See Choosing a Shard Key.
Shard Key Specification
You must specify the shard key when you shard the collection. You can
use the mongo
shell method
sh.shardCollection()
to shard a collection:
sh.shardCollection(<namespace>, <key>) // Optional parameters omitted
namespace | Specify the full namespace of the collection (i.e.
"<database>.<collection>") to shard. |
key | Specify a document
|
Note
Shard Key Fields and Values
- Existence
Starting in version 4.4, documents in sharded collections can be missing the shard key fields. A missing shard key falls into the same range as a
null
-valued shard key. See Missing Shard Key.In version 4.2 and earlier, shard key fields must exist in every document for a sharded collection.
- Update Field's Value
Starting in MongoDB 4.2, you can update a document's shard key value unless the shard key field is the immutable
_id
field. Before MongoDB 4.2, a document's shard key field value is immutable.For details on updating the shard key, see Change a Document's Shard Key Value.
For more information on the sharding method, see
sh.shardCollection()
.
Refine a Shard Key
Starting in MongoDB 4.4, you can use
refineCollectionShardKey
to refine a collection's shard
key. The refineCollectionShardKey
adds a suffix field
or fields to the existing key to create the new shard key.
For example, you may have an existing orders
collection with the
shard key { customer_id: 1 }
. You can change the shard key by
adding a suffix order_id
field to the shard key so that {
customer_id: 1, order_id: 1 }
becomes the new shard key. For more
information, see the refineCollectionShardKey
command.
Refining a collection's shard key allows for a more fine-grained data distribution and can address situations where the existing key has led to jumbo (i.e. indivisible) chunks due to insufficient cardinality.
In MongoDB 4.2 and earlier, the choice of shard key cannot be changed after sharding.
Shard Key Indexes
All sharded collections must have an index that supports the shard key; i.e. the index can be an index on the shard key or a compound index where the shard key is a prefix of the index.
If the collection is empty,
sh.shardCollection()
creates the index on the shard key if such an index does not already exists.If the collection is not empty, you must create the index first before using
sh.shardCollection()
.
If you drop the last valid index for the shard key, recover by recreating an index on just the shard key.
Unique Indexes
MongoDB can enforce a uniqueness constraint on a ranged shard key index. Through the use of a unique index on the shard key, MongoDB enforces uniqueness on the entire key combination and not individual components of the shard key.
For a ranged sharded collection, only the following indexes can be unique:
the index on the shard key
a compound index where the shard key is a prefix
the default
_id
index; however, the_id
index only enforces the uniqueness constraint per shard if the_id
field is not the shard key or the prefix of the shard key.Important
Uniqueness and the _id Index
If the
_id
field is not the shard key or the prefix of the shard key,_id
index only enforces the uniqueness constraint per shard and not across shards.For example, consider a sharded collection (with shard key
{x: 1}
) that spans two shards A and B. Because the_id
key is not part of the shard key, the collection could have a document with_id
value1
in shard A and another document with_id
value1
in shard B.If the
_id
field is not the shard key nor the prefix of the shard key, MongoDB expects applications to enforce the uniqueness of the_id
values across the shards.
The unique index constraints mean that:
For a to-be-sharded collection, you cannot shard the collection if the collection has other unique indexes.
For an already-sharded collection, you cannot create unique indexes on other fields.
A unique index stores a null value for a document missing the indexed field; that is a missing index field is treated as another instance of a
null
index key value. For more information, see Missing Document Field in a Unique Single-Field Index.
To enforce uniqueness on the shard key values, pass the unique
parameter as true
to the sh.shardCollection()
method:
If the collection is empty,
sh.shardCollection()
creates the unique index on the shard key if such an index does not already exist.If the collection is not empty, you must create the index first before using
sh.shardCollection()
.
Although you can have a unique compound index where the shard
key is a prefix, if using unique
parameter, the collection must have a unique index that is on the shard
key.
You cannot specify a unique constraint on a hashed index.
Choosing a Shard Key
The choice of shard key affects the creation and distribution of the chunks across the available shards. This affects the overall efficiency and performance of operations within the sharded cluster.
The ideal shard key allows MongoDB to distribute documents evenly throughout the cluster. See also sharding strategy.
At minimum, consider the consequences of the cardinality, frequency, and monotonicity of a potential shard key.
Note
Starting in MongoDB 4.4, you can use
refineCollectionShardKey
to refine a collection's shard key. TherefineCollectionShardKey
adds a suffix field or fields to the existing key to create the new shard key.In MongoDB 4.2 and earlier, once you shard a collection, the selection of the shard key is immutable.
Restrictions
For restrictions on shard key, see Shard Key Limitations.
Collection Size
When sharding a collection that is not empty, the shard key can constrain the maximum supported collection size for the initial sharding operation only. See Sharding Existing Collection Data Size.
Important
A sharded collection can grow to any size after successful sharding.
Shard Key Cardinality
The cardinality of a shard key determines the maximum number of chunks the balancer can create. This can reduce or remove the effectiveness of horizontal scaling in the cluster.
A unique shard key value can exist on no more than a single chunk at any
given time. If a shard key has a cardinality of 4
, then there can
be no more than 4
chunks within the sharded cluster, each storing
one unique shard key value. This constrains the number of effective
shards in the cluster to 4
as well - adding additional shards would
not provide any benefit.
The following image illustrates a sharded cluster using the field
X
as the shard key. If X
has low cardinality, the distribution of
inserts may look similar to the following:
The cluster in this example would not scale horizontally, as incoming writes would only route to a subset of shards.
Choosing a shard key with high cardinality does not, on its own, guarantee even distribution of data across the sharded cluster. The frequency and monotonicity of the shard key also contribute to data distribution. Take each factor into account when choosing a shard key.
If your data model requires sharding on a key that has low cardinality, consider using a compound index using a field that has higher relative cardinality.
Shard Key Frequency
Consider a set representing the range of shard key values - the frequency
of the shard key represents how often a given value occurs in the data. If the
majority of documents contain only a subset of those values, then the chunks
storing those documents become a bottleneck within the cluster. Furthermore,
as those chunks grow, they may become indivisible chunks
as they cannot be split any further. This reduces or removes the effectiveness
of horizontal scaling within the cluster.
The following image illustrates a sharded cluster using the field X
as the
shard key. If a subset of values for X
occur with high frequency, the
distribution of inserts may look similar to the following:
Choosing a shard key with low frequency does not, on its own, guarantee even distribution of data across the sharded cluster. The cardinality and monotonicity of the shard key also contribute to data distribution. Take each factor into account when choosing a shard key.
If your data model requires sharding on a key that has high frequency values, consider using a compound index using a unique or low frequency value.
Shard Key Monotonicity
A shard key on a value that increases or decreases monotonically is more likely to distribute inserts to a single shard within the cluster.
This occurs because every cluster has a chunk that captures a range with an
upper bound of maxKey. maxKey
always
compares as higher than all other values. Similarly, there is a chunk that
captures a range with a lower bound of minKey.
minKey
always compares as lower than all other values.
If the shard key value is always increasing, all new inserts are routed to the
chunk with maxKey
as the upper bound. If the shard key value is always
decreasing, all new inserts are routed to the chunk with minKey
as the
lower bound. The shard containing that chunk becomes the bottleneck for write
operations.
The following image illustrates a sharded cluster using the field X
as the shard key. If the values for X
are monotonically increasing, the
distribution of inserts may look similar to the following:
If the shard key value was monotonically decreasing, then all inserts would
route to Chunk A
instead.
Choosing a shard key that does not change monotonically does not, on its own, guarantee even distribution of data across the sharded cluster. The cardinality and frequency of the shard key also contribute to data distribution. Take each factor into account when choosing a shard key.
If your data model requires sharding on a key that changes monotonically, consider using Hashed Sharding.
Change a Document's Shard Key Value
Note
When updating the shard key value
You must run on a
mongos
. Do not issue the operation directly on the shard.You must run either in a transaction or as a retryable write.
You must include an equality condition on the full shard key in the query filter. For example, if a collection messages uses
{ activityid: 1, userid : 1 }
as the shard key, to update the shard key for a document, you must includeactivityid: <value>, userid: <value>
in the query filter. You can include additional fields in the query as appropriate.
See also the specific write command/methods for additional operation-specific requirements when run against a sharded collection.
Starting in MongoDB 4.2, you can update a document's shard key value
unless the shard key field is the immutable _id
field. To update,
use the following operations to update a document's shard key value:
Command | Method |
---|---|
update with multi: false | |
If the shard key modification results in moving the document to another shard, you cannot specify more than one shard key modification in the bulk operation; i.e. batch size of 1. If the shard key modification does not result in moving the document to another shard, you can specify multiple shard key modification in the bulk operation. |
Warning
Starting in version 4.4, documents in sharded collections can be missing the shard key fields. Take precaution to avoid accidentally removing the shard key when changing a document's shard key value.
Missing Shard Key
Starting in version 4.4, documents in sharded collections can be missing the shard key fields.
Chunk Range and Missing Shard Key Fields
Missing shard keys fall within the same chunk range as shard keys with
null values. For example, if the shard key is on the fields { x: 1, y: 1
}
, then:
Document Missing Shard Key | Falls into Same Range As |
---|---|
{ x: "hello" } | { x: "hello", y: null } |
{ y: "goodbye" } | { x: null, y: "goodbye" } |
{ z: "oops" } | { x: null, y: null } |
Read/Write Operations and Missing Shard Key Fields
To target documents with missing shard key fields, you can use the
{ $exists: false }
filter condition on the shard key
fields. For example, if the shard key is on the fields { x: 1, y: 1
}
, to find the documents with missing shard key fields:
db.shardedcollection.find( { $or: [ { x: { $exists: false } }, { y: { $exists: false } } ] } )
If you specify an null equality match filter condition (e.g. { x: null
}
), the filter matches both those documents with missing shard
key fields and those with shard key fields set to null
.
Some write operations, such as a write with an upsert
specification, require an equality match on the shard key. In those
cases, to target a document that is missing the shard key, include
another filter condition in addition to the null
equality match.
For example:
{ _id: <value>, <shardkeyfield>: null } // _id of the document missing shard key
Set the Missing Shard Key Fields
To set missing shard key fields (which is different from changing
the value of an existing shard key field), you can
use the following operations on a mongos
:
Command | Method | Description |
---|---|---|
update with multi: true | multi: true |
|
update with multi: false |
| |
| ||
|
Once you set the shard key field, to modify the field's value, see Change a Document's Shard Key Value.