- Administration >
- Data Center Awareness >
- Tag Aware Sharding >
- Segmenting Data by Application or Customer
Segmenting Data by Application or Customer¶
MongoDB allows you to associate ranges of shard keys to one or more shards using tags. MongoDB routes data to a target shard respecting any configured tags.
This tutorial shows you how to segment data using tag-aware sharding.
Consider the following scenarios where segmenting data by application or customer may be necessary:
- A database serving multiple applications
- A database serving multiple customers
- A database that requires isolating ranges or subsets of application or customer data
- A database that requires resource allocation for ranges or subsets of application or customer data
This diagram illustrates a sharded cluster using tags to segment data based on application or customer. This allows for data to be isolated to specific shards. Additionally, each shard can have specific hardware allocated to fit the performance requirement of the data stored on that shard.
Scenario¶
An application tracks the score of a user along with a client
field,
storing scores in the gamify
database under the users
collection.
Each possible value of client
requires its own tag to allow for
data segmentation. It also allows the administrator to optimize the
hardware for each shard associated to a client
for performance and cost.
The following documents represent a partial view of two users:
Shard Key¶
The users
collection uses the { client : 1, userid : 1 }
compound
index as the shard key.
The client
field in each document allows creating a tag range on each
distinct client value.
The userid
field provides a high cardinality and low frequency
component to the shard key relative to country
.
See Choosing a Shard Key for more general instructions on selecting a shard key.
Architecture¶
The application requires tagging each shard in the cluster for a specific
client
.
The sharded cluster deployment currently consists of four shards.
Tags¶
For this application, there are two client tags.
- Robot client (“robot”)
- This tag represents all documents where
client : robot
. - FruitOS client (“fruitos”)
- This tag represents all documents where
client : fruitos
.
Write Operations¶
With tag-aware sharding, if an inserted or updated document matches a configured tag range, it can only be written to a shard with the related tag.
MongoDB can write documents that do not match a configured tag range to any shard in the cluster.
Note
The behavior described above requires the cluster to be in a steady state with no chunks violating a configured tag range. See the following section on the balancer for more information.
Read Operations¶
MongoDB can route queries to a specific shard if the query includes at least
the client
field.
For example, MongoDB can attempt a targeted read operation on the following query:
Queries without the client
field perform broadcast operations.
Balancer¶
The balancer migrates the tagged chunks to the appropriate shard. Until the migration, shards may contain chunks that violate configured tag ranges and tags. Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned tags and tag ranges.
Adding or removing tags or tag ranges can result in chunk migrations. Depending on the size of your data set and the number of chunks a tag range affects, these migrations may impact cluster performance. Consider running your balancer during specific scheduled windows. See Schedule the Balancing Window for a tutorial on how to set a scheduling window.
Security¶
For sharded clusters running with Role-Based Access Control, authenticate as a user
with at least the clusterManager
role on the admin
database.
Procedure¶
You must be connected to a mongos
associated to the target
sharded cluster to proceed. You cannot create tags by
connecting directly to a shard.
Disable the Balancer¶
The balancer must be disabled on the collection to ensure no migrations take place while configuring the new tags.
Use sh.disableBalancing()
, specifying the namespace of the
collection, to stop the balancer.
Use sh.isBalancerRunning()
to check if the balancer process
is currently running. Wait until any current balancing rounds have completed
before proceeding.
Tag each shard¶
Tag shard0000
with the robot
tag.
Tag shard0001
with the robot
tag.
Tag shard0002
with the fruitos
tag.
Tag shard0003
with the fruitos
tag.
Run sh.status()
to review the tags configured for the sharded
cluster.
Define ranges for each tag¶
Define range for the robot
client and associate it to the robot
tag using the sh.addTagRange()
method.
This method requires:
- The full namespace of the target collection
- The inclusive lower bound of the range
- The exclusive upper bound of the range
- The name of the tag
Define range for the fruitos
client and associate it to the
fruitos
tag using the sh.addTagRange()
method.
This method requires:
- The full namespace of the target collection
- The inclusive lower bound of the range
- The exclusive upper bound of the range
- The name of the tag
The MinKey
and MaxKey
values are reserved special
values for comparisons. MinKey
always compares as lower than
every other possible value, while MaxKey
always compares as
higher than every other possible value. The configured ranges captures every
user for each client
.
Enable the Balancer¶
Re-enable the balancer to rebalance the cluster.
Use sh.enableBalancing()
, specifying the namespace of the
collection, to start the balancer.
Use sh.isBalancerRunning()
to check if the balancer process
is currently running.
Review the changes¶
The next time the balancer runs, it splits and migrates chunks across the shards respecting the tag ranges and tags.
Once balancing finishes, the shards tagged as robot
only
contain documents with client : robot
, while shards tagged as fruitos
only contain documents with client : fruitos
.
You can confirm the chunk distribution by running sh.status()
.