- Sharding >
- Sharding Reference >
- Troubleshoot Sharded Clusters
Troubleshoot Sharded Clusters¶
On this page
- Application Servers or
mongos
Instances Become Unavailable - A Single
mongod
Becomes Unavailable in a Shard - All Members of a Shard Become Unavailable
- A Config Server Replica Set Member Become Unavailable
- Renaming Mirrored Config Servers and Cluster Availability
- Cursor Fails Because of Stale Config Data
- Shard Keys and Cluster Availability
- Config Database String Error
- Avoid Downtime when Moving Config Servers
moveChunk commit failed
Error
This page describes common strategies for troubleshooting sharded cluster deployments.
Renaming Mirrored Config Servers and Cluster Availability¶
If the sharded cluster is using mirrored config servers instead of a
replica set and the name or address that a sharded cluster uses to
connect to a config server changes, you must restart every
mongod
and mongos
instance in the sharded cluster.
Avoid downtime by using CNAMEs to identify config servers within the
MongoDB deployment.
To avoid downtime when renaming config servers, use DNS names unrelated to physical or virtual hostnames to refer to your config servers.
Generally, refer to each config server using the DNS alias (e.g. a
CNAME record). When specifying the config server connection string to
mongos
, use these names. These records make it possible to
change the IP address or rename config servers without changing the
connection string and without having to restart the entire cluster.
Cursor Fails Because of Stale Config Data¶
A query returns the following warning when one or more of the
mongos
instances has not yet updated its cache of the
cluster’s metadata from the config database:
This warning should not propagate back to your application. The
warning will repeat until all the mongos
instances refresh
their caches. To force an instance to refresh its cache, run the
flushRouterConfig
command.
Shard Keys and Cluster Availability¶
The most important consideration when choosing a shard key are:
- to ensure that MongoDB will be able to distribute data evenly among shards, and
- to scale writes across the cluster, and
- to ensure that
mongos
can isolate most queries to a specificmongod
.
Furthermore:
- Each shard should be a replica set, if a specific
mongod
instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable. - If the shard key allows the
mongos
to isolate most operations to a single shard, then the failure of a single shard will only render some data unavailable. - If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable.
In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.
Config Database String Error¶
Changed in version 3.2: Starting in MongoDB 3.2, config servers can be deployed as replica sets
by default. The mongos
instances for the sharded cluster
must specify the same config server replica set name but can specify
hostname and port of different members of the replica set.
If using the deprecated topology of three mirrored mongod
instances for config servers, mongos
instances in a sharded
cluster must specify identical configDB
string.
Avoid Downtime when Moving Config Servers¶
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.
moveChunk commit failed
Error¶
At the end of a chunk migration, the shard must connect to the config database to update the chunk’s record in the cluster metadata. If the shard fails to connect to the config database, MongoDB reports the following error:
When this happens, the primary member of the shard’s replica set then terminates to protect data consistency. If a secondary member can access the config database, data on the shard becomes accessible again after an election.
The user will need to resolve the chunk migration failure independently. If you encounter this issue, contact the MongoDB User Group or MongoDB Support to address this issue.