db.collection.analyzeShardKey()
Definition
db.collection.analyzeShardKey(key, opts)
Calculates metrics for evaluating a shard key for an unsharded or sharded collection. Metrics are based on sampled queries. You can use
configureQueryAnalyzer
to configure query sampling on a collection.
Compatibility
This method is available in deployments hosted in the following environments:
MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud
Note
This command is supported in all MongoDB Atlas clusters. For information on Atlas support for all commands, see Unsupported Commands.
MongoDB Enterprise: The subscription-based, self-managed version of MongoDB
MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB
Syntax
db.collection.analyzeShardKey()
has this syntax:
db.collection.analyzeShardKey( <shardKey>, { keyCharacteristics: <bool>, readWriteDistribution: <bool>, sampleRate: <double>, sampleSize: <int> } )
Fields
Field | Type | Necessity | Description |
---|---|---|---|
key | document | Required | Shard key to analyze. This can be a candidate shard key for an unsharded collection or sharded collection or the current shard key for a sharded collection. There is no default value. |
opts.keyCharacteristics | boolean | Optional | Whether or not the metrics about the characteristics of the shard key are calculated. For details, see keyCharacteristics. Defaults to |
opts.readWriteDistribution | boolean | Optional | Whether or not the metrics about the read and write distribution are calculated. For details, see readWriteDistribution. Defaults to |
opts.sampleRate | double | Optional | The proportion of the documents in the collection to sample when
calculating the metrics about the characteristics of the shard
key. If you set Must greater than There is no default value. |
opts.sampleSize | integer | Optional | The number of documents to sample when calculating the metrics
about the characteristics of the shard key. If you set
If not specified and |
Behavior
For behavior, see analyzeShardKey Behavior.
Access Control
For details, see analyzeShardKey Access Control.
Output
For sample output, see analyzeShardKey Output.
Examples
Consider a simplified version of a social media app. The collection
we are trying to shard is the post
collection.
Documents in the post
collection have the following schema:
{ userId: <uuid>, firstName: <string>, lastName: <string>, body: <string>, // the field that can be modified. date: <date>, // the field that can be modified. }
Background Information
The app has 1500 users.
There are 30 last names and 45 first names, some more common than others.
There are three celebrity users.
Each user follows exactly five other users and has a very high probability of following at least one celebrity user.
Sample Workload
Each user posts about two posts a day at random times. They edit each post once, right after it is posted.
Each user logs in every six hours to read their own profile and posts by the users they follow from the past 24 hours. They also reply under a random post from the past three hours.
For every user, the app removes posts that are more than three days old at midnight.
Workload Query Patterns
This workload has the following query patterns:
find
command with filter{ userId: , firstName: , lastName: }
find
command with filter{ $or: [{ userId: , firstName: , lastName:, date: { $gte: }, ] }
findAndModify
command with filter{ userId: , firstName: , lastName: , date: }
to update the body and date field.update
command withmulti: false
and filter{ userId: , firstName: , lastName: , date: { $gte: , $lt: } }
to update the body and date field.delete
command withmulti: true
and filter{ userId: , firstName: , lastName: , date: { $lt: } }
Below are example metrics returned by db.collection.analyzeShardKey
for some
candidate shard keys, with sampled queries collected from seven days of
workload.
Note
Before you run the db.collection.analyzeShardKey
method, read the
Supporting Indexes section. If you require supporting
indexes for the shard key you are analyzing, use the
db.collection.createIndex()
method to create the indexes.
{ lastName: 1 } keyCharacteristics
This db.collection.analyzeShardKey
method provides metrics on the
{ lastName: 1 }
shard key on the social.post
collection:
use social db.post.analyzeShardKey( { lastName: 1 }, { keyCharacteristics: true, readWriteDistribution: false } )
The output for this command is similar to the following:
{ "keyCharacteristics": { "numDocsTotal" : 9039, "avgDocSizeBytes" : 153, "numDocsSampled" : 9039, "isUnique" : false, "numDistinctValues" : 30, "mostCommonValues" : [ { "value" : { "lastName" : "Smith" }, "frequency" : 1013 }, { "value" : { "lastName" : "Johnson" }, "frequency" : 984 }, { "value" : { "lastName" : "Jones" }, "frequency" : 962 }, { "value" : { "lastName" : "Brown" }, "frequency" : 925 }, { "value" : { "lastName" : "Davies" }, "frequency" : 852 } ], "monotonicity" : { "recordIdCorrelationCoefficient" : 0.0771959161, "type" : "not monotonic" }, } }
{ userId: 1 } keyCharacteristics
This db.collection.analyzeShardKey
method provides metrics on the
{ userId: 1 }
shard key on the social.post
collection:
use social db.post.analyzeShardKey( { userId: 1 }, { keyCharacteristics: true, readWriteDistribution: false } )
The output for this method is similar to the following:
{ "keyCharacteristics": { "numDocsTotal" : 9039, "avgDocSizeBytes" : 162, "numDocsSampled" : 9039, "isUnique" : false, "numDistinctValues" : 1495, "mostCommonValues" : [ { "value" : { "userId" : UUID("aadc3943-9402-4072-aae6-ad551359c596") }, "frequency" : 15 }, { "value" : { "userId" : UUID("681abd2b-7a27-490c-b712-e544346f8d07") }, "frequency" : 14 }, { "value" : { "userId" : UUID("714cb722-aa27-420a-8d63-0d5db962390d") }, "frequency" : 14 }, { "value" : { "userId" : UUID("019a4118-b0d3-41d5-9c0a-764338b7e9d1") }, "frequency" : 14 }, { "value" : { "userId" : UUID("b9c9fbea-3c12-41aa-bc69-eb316047a790") }, "frequency" : 14 } ], "monotonicity" : { "recordIdCorrelationCoefficient" : -0.0032039729, "type" : "not monotonic" }, } }
{ userId: 1 } readWriteDistribution
This db.collection.analyzeShardKey
command provides metrics on the
{ userId: 1 }
shard key on the social.post
collection:
use social db.post.analyzeShardKey( { userId: 1 }, { keyCharacteristics: false, readWriteDistribution: true } )
The output for this method is similar to the following:
{ "readDistribution" : { "sampleSize" : { "total" : 61363, "find" : 61363, "aggregate" : 0, "count" : 0, "distinct" : 0 }, "percentageOfSingleShardReads" : 50.0008148233, "percentageOfMultiShardReads" : 49.9991851768, "percentageOfScatterGatherReads" : 0, "numReadsByRange" : [ 688, 775, 737, 776, 652, 671, 1332, 1407, 535, 428, 985, 573, 1496, ... ], }, "writeDistribution" : { "sampleSize" : { "total" : 49638, "update" : 30680, "delete" : 7500, "findAndModify" : 11458 }, "percentageOfSingleShardWrites" : 100, "percentageOfMultiShardWrites" : 0, "percentageOfScatterGatherWrites" : 0, "numWritesByRange" : [ 389, 601, 430, 454, 462, 421, 668, 833, 493, 300, 683, 460, ... ], "percentageOfShardKeyUpdates" : 0, "percentageOfSingleWritesWithoutShardKey" : 0, "percentageOfMultiWritesWithoutShardKey" : 0 } }