- Administration >
- Data Center Awareness >
- Tag Aware Sharding >
- Segmenting Data by Location
Segmenting Data by Location¶
MongoDB Tag Aware Sharding allows administrators to control data distribution in a sharded cluster by defining ranges of the shard key and tagging them to one or more shards.
This tutorial uses Tag Aware Sharding to segment data based on geographic area.
The following are some example use cases for segmenting data by geographic area:
- An application that requires segmenting user data based on geographic country
- A database that requires resource allocation based on geographic country
The following diagram illustrates a sharded cluster that uses geographic based shard tags to manage and satisfy data segmentation requirements.
Scenario¶
A financial chat application logs messages, tracking the country of the
originating user. The application stores the logs in the chat
database
under the messages
collection. The chats contain information that must be
segmented by country to have servers local to the country serve read and write
requests for the country’s users. A group of countries can be assigned
same tag in order to share resources.
The application currently has users in the US, UK, and Germany. The
country
field represents the user’s country based on its
ISO 3166-1 Alpha-2
two-character country codes.
The following documents represent a partial view of three chat messages:
Shard Key¶
The messages
collection uses the { country : 1, userid : 1 }
compound
index as the shard key.
The country
field in each document allows for creating a tag range on
each distinct country value.
The userid
field provides a high cardinality
and low frequency component to the shard key
relative to country
.
See Choosing a Shard Key for more general instructions on selecting a shard key.
Architecture¶
The sharded cluster has shards in two data centers - one in Europe, and one in North America.
Tags¶
This application requires one tag per data center.
EU
- European data centerShards deployed on this data center are tagged as
EU
.For each country using the
EU
data center for local reads and writes, create a tag range with:- a lower bound of
{ "country" : <country>, "userid" : MinKey }
- an upper bound of
{ "country" : <country>, "userid" : MaxKey }
- a lower bound of
NA
- North American data centerShards deployed on this data center are tagged as
NA
.For each country using the
NA
data center for local reads and writes, create a tag range with:- a lower bound of
{ "country" : <country>, "userid" : MinKey }
- an upper bound of
{ "country" : <country>, "userid" : MaxKey }
- a lower bound of
Note
The MinKey
and MaxKey
values are reserved special
values for comparisons
Write Operations¶
With tag-aware sharding, the mongos
compares incoming documents
against any configured tag ranges. If a document matches a configured tag
range, mongos
routes the document to a shard with the appropriate
tag.
On insert, MongoDB can route documents that do not match a configured tag range to any shard in the cluster.
Read Operations¶
MongoDB can route queries to a specific shard if the query includes at least
the country
field.
For example, MongoDB can attempt a targeted read operation on the following query:
Queries without the country
field perform broadcast operations.
Balancer¶
The balancer migrates the tagged chunks to the appropriate shard. Until the migration, shards may contain chunks that violate configured tag ranges and tags. Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned tags and tag ranges.
Adding or removing tags or tag ranges can result in chunk migrations. Depending on the size of your data set and the number of chunks a tag range affects, these migrations may impact cluster performance. Consider running your balancer during specific scheduled windows. See Schedule the Balancing Window for a tutorial on how to set a scheduling window.
Security¶
For sharded clusters running with Role-Based Access Control, authenticate as a user
with at least the clusterManager
role on the admin
database.
Procedure¶
You must be connected to a mongos
to create tags and tag ranges.
You cannot create tags by connecting directly to a shard.
Disable the Balancer (Optional)¶
To reduce performance impacts, the balancer may be disabled on the collection to ensure no migrations take place while configuring the new tags.
Use sh.disableBalancing()
, specifying the namespace of the
collection, to stop the balancer.
Use sh.isBalancerRunning()
to check if the balancer process
is currently running. Wait until any current balancing rounds have completed
before proceeding.
Tag each shard¶
Tag each shard in the North American data center with the NA
tag.
Tag each shard in the European data center with the EU
tag.
You can review the tags assigned to any given shard by running
sh.status()
.
Define ranges for each tag¶
For shard key values where country : US
, define a shard key range
and associate it to the NA
tag using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the tag.
For shard key values where country : UK
, define a shard key range
and associate it to the EU
tag using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the tag.
For shard key values where country : DE
, define a shard key range
and associate it to the EU
tag using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the tag.
The MinKey
and MaxKey
values are reserved special
value for comparisons. MinKey
always compares as lower than every
other possible value, while MaxKey
always compares as higher than
every other possible value. The configured ranges captures every user for
each device
.
Both country : UK
and country : DE
are assigned to the EU
tag.
This associates any document with either UK
or DE
as the value for
country
to the EU data center.
Enable the Balancer (Optional)¶
If the balancer was disabled in previous steps, re-enable the balancer at the completion of this procedure to rebalance the cluster.
Use sh.enableBalancing()
, specifying the namespace of the
collection, to start the balancer.
Use sh.isBalancerRunning()
to check if the balancer process
is currently running.
Review the Changes¶
The next time the balancer runs, it splits chunks where necessary and migrates chunks across the shards respecting the tag ranges and tags.
Once balancing finishes, the shards tagged as NA
should only
contain documents with country : NA
, while shards tagged as EU
should only contain documents with country : UK
or country : DE
.
A document with a value for country
other than NA
, UK
, or
DE
can reside on any shard in the cluster.
You can confirm the chunk distribution by running sh.status()
.
Updating Tags¶
The application requires the following updates:
- Documents with
country : UK
must now be associated to the newUK
data center. Any data in theEU
data center must be migrated - The chat application now supports users in Mexico. Documents with
country : MX
must be routed to theNA
data center.
Perform the following procedures to update the tag ranges.
Disable the Balancer (Optional)¶
To reduce performance impacts, the balancer may be disabled on the collection to ensure no migrations take place while configuring the new tags or removing the old ones.
Use sh.disableBalancing()
, specifying the namespace of the
collection, to stop the balancer
Use sh.isBalancerRunning()
to check if the balancer process
is currently running. Wait until any current balancing rounds have completed
before proceeding.
Add the new UK
tag¶
Tag each shard in the UK
data center with the UK
tag
You can review the tags assigned to any given shard by running
sh.status()
.
Remove the old tag range¶
Remove the old tag range associated to the UK
country using the
sh.removeTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the tag.
Add new tag ranges¶
For shard key values where country : UK
, define a shard key range
and associate it to the UK
tag using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the tag.
For shard key values where country : MX
, define a shard key range
and associate it to the NA
tag using the sh.addTagRange()
method. This method requires:
- The full namespace of the target collection.
- The inclusive lower bound of the range.
- The exclusive upper bound of the range.
- The name of the tag.
The MinKey
and MaxKey
values are reserved special values
for comparisons. MinKey
always compares as lower than every other
possible value, while MaxKey
always compares as higher than
every other possible value. This ensures the two ranges captures the
entire possible value space of creation_date
.
Enable the Balancer (Optional)¶
If the balancer was disabled in previous steps, re-enable the balancer at the completion of this procedure to rebalance the cluster.
Use sh.enableBalancing()
, specifying the namespace of the
collection, to start the balancer
Use sh.isBalancerRunning()
to check if the balancer process
is currently running.
Review the changes¶
The next time the balancer runs, it splits chunks where necessary and migrates chunks across the shards respecting the tag ranges and tags.
Before balancing, the shards tagged as EU
only contained documents where
country : DE
or country : UK
. Documents with the country : MX
could be stored on any shard in the sharded cluster.
After balancing, the shards tagged as EU
should only contain documents
where country : DE
, while shards tagged as UK
should only contain
documents where country : UK
. Additionally, shards tagged as NA
should only contain documents where country : US
or country : MX
.
A document with a value for country
other than NA
, MX
, UK
,
or DE
can reside on any shard in the cluster.
You can confirm the chunk distribution by running sh.status()
.