Collations
On this page
Overview
In this guide, you can learn how to use collations with MongoDB to order your query or aggregation operation results by string values. A collation is a set of character ordering and matching rules that apply to a specific language and locale.
You can learn more about collations in the following sections in this guide:
Collations in MongoDB
MongoDB sorts strings using binary collation by default. The binary collation uses the ASCII standard character values to compare and order strings. Certain languages and locales have specific character ordering conventions that differ from the ASCII character values.
For example, in Canadian French, the right-most accented character (diacritic) determines the ordering for strings when all preceding characters are the same. Consider the following Canadian French words:
cote
coté
côte
côté
When using binary collation, MongoDB sorts them in the following order:
cote coté côte côté
When using the Canadian French collation, MongoDB sorts them in a different order as shown below:
cote côte coté côté
How to Specify Collations
MongoDB supports collations on most CRUD operations and aggregations. For a complete list of supported operations, see the Operations that Support Collations server manual page.
You can specify the locale code and optional variant in the following string format:
"<locale code>@collation=<variant code>"
The following example specifies the "de" locale code and "phonebook" variant code:
"de@collation=phonebook"
If you do not need to specify a variant, omit everything after the locale code as follows:
"de"
For a complete list of supported locales, see our server manual page on Supported Languages and Locales.
The following sections show you different ways to apply collations in MongoDB:
Collection
You can set a default collation when you create a collection. When you create a collection with a specified collation, all supported operations that scan that collection apply the rules of the collation.
You can only assign a default collation to a collection when you create that collection. However, you can specify a collation in a new index on an existing collection. See the Index section of this guide for more information.
The following snippet shows how to specify the "en_US" locale collation
when creating a new collection called items
:
database.createCollection( "items", new CreateCollectionOptions().collation( Collation.builder().locale("en_US").build()));
To check whether you created the collation successfully, retrieve a list of the indexes on that collection as follows:
MongoCollection<Document> collection = database.getCollection("items"); List<Document> indexes = new ArrayList<>(); collection.listIndexes().into(indexes); // Prints the collection's indexes and any default collations indexes.forEach(idx -> System.out.println(idx.toJson()));
The output of your code should contain the following:
{ ... "collation": { "locale": "en_US", ... } ... }
Index
You can specify a collation when you create a new index on a collection. The index stores an ordered representation of the documents in the collection so your operation does not need to perform the ordering in-memory. To use the index, your operation must meet the following criteria:
The operation uses the same collation as the one specified in the index.
The operation is covered by the index that contains the collation.
The following code snippet shows how you can create an index on the "name" field with the "en_US" locale collation in ascending order:
MongoCollection<Document> collection = database.getCollection("items"); IndexOptions idxOptions = new IndexOptions(); // Defines options that set a collation locale idxOptions.collation(Collation.builder().locale("en_US").build()); // Creates an index on the "name" field with the collation and ascending sort order collection.createIndex(Indexes.ascending("name"), idxOptions);
To check whether you created the collation successfully, retrieve a list of the indexes on that collection as follows:
MongoCollection<Document> collection = database.getCollection("items"); List<Document> indexes = new ArrayList<>(); collection.listIndexes().into(indexes); // Prints the collection's indexes and any default collations indexes.forEach(idx -> System.out.println(idx.toJson()));
The output of the preceding code should contain the following:
{ ... "collation": { "locale": "en_US", ... } ... }
The following code snippet shows an example operation that specifies the same collation and is covered by the index we created in the preceding code snippet:
FindIterable<Document> cursor = collection.find() .collation(Collation.builder().locale("en_US").build()) .sort(Sorts.ascending("name"));
Operation
You can override the default collation on a collection by passing the new collation as a parameter to one of the supported operations. However, since the operation does not use an index, the operation may not perform as well as one that is covered by an index. For more information on the disadvantages of sorting operations not covered by an index, see the server manual page on Use Indexes to Sort Query Results.
The following code snippet shows an example query operation with the following characteristics:
The referenced collection contains the default collation "en_US" similar to the one specified in the Collection section.
The query specifies the Icelandic ("is") collation which is not covered by the collection's default collation index.
Since the specified collation is not covered by an index, the sort operation is performed in-memory.
FindIterable<Document> cursor = collection.find() .collation(Collation.builder().locale("is").build()) .sort(Sorts.ascending("name"));
Index Types That Do Not Support Collations
While most MongoDB index types support collation, the following types support only binary comparison:
Collation Options
This section covers various collation options and how to specify them to further refine the ordering and matching behavior.
Collation Option | Description |
---|---|
Locale | Required. The ICU locale code for language and variant. locale() API Documentation |
Backwards | Whether to consider diacritics from the end of the string first. backwards() API Documentation |
Case-sensitivity | Whether to consider case (upper or lower) as different values. caseLevel() API Documentation |
Alternate | Whether to consider spaces and punctuation. collationAlternate() API Documentation |
Case First | Whether to consider uppercase or lowercase first. collationCaseFirst() API Documentation |
Max Variable | Whether to ignore whitespace or both whitespace and punctuation. This setting is only valid when the alternate setting is "shifted". collationMaxVariable() API Documentation |
Strength | ICU level of comparison. The default value is "tertiary". For more information on each level, see the ICU Comparison Levels. collationStrength() API Documentation |
Normalization | Whether to perform unicode normalization on the text as needed. For more information on unicode normalization, see Unicode Normalization Forms. normalization() API Documentation |
Numeric Ordering | Whether to order numbers according to numeric value rather than collation order. numericOrdering() API Documentation |
You can use the Collation.Builder
class to specify values for the
preceding collation options. You can call the build()
method to construct a
Collation
object as shown in the following code snippet:
Collation.builder() .caseLevel(true) .collationAlternate(CollationAlternate.SHIFTED) .collationCaseFirst(CollationCaseFirst.UPPER) .collationMaxVariable(CollationMaxVariable.SPACE) .collationStrength(CollationStrength.SECONDARY) .locale("en_US") .normalization(false) .numericOrdering(true) .build();
For more information on the corresponding methods and parameters they take, see the API Documentation for Collation.Builder.
Collation Examples
This section contains examples that demonstrate how to use a selection of MongoDB operations that support collations. For each example, assume that you start with the following collection of documents:
{ "_id" : 1, "first_name" : "Klara" } { "_id" : 2, "first_name" : "Gunter" } { "_id" : 3, "first_name" : "Günter" } { "_id" : 4, "first_name" : "Jürgen" } { "_id" : 5, "first_name" : "Hannah" }
In the following examples, we specify the "de@collation=phonebook" locale and variant collation. The "de" part of the collation specifies the German locale and the "collation=phonebook" part specifies a variant. The "de" locale collation contains rules for prioritizing proper nouns, identified by capitalization of the first letter. In the "collation=phonebook" variant, characters with umlauts are ordered before the same characters without them in an ascending sort.
find() and sort() Example
The following example demonstrates how you can apply a collation when
retrieving sorted results from a collection. To perform this
operation, call find()
on the example collection and chain the
collation()
and sort()
methods to specify the order in which you want
to receive the results.
Note
The following code example uses imports from the
import com.mongodb.client.model
package for convenience.
List<Document> results = new ArrayList<>(); // Retrieves all documents and applies a "de@collation-phonebook" collation and ascending sort to the results collection.find() .collation(Collation.builder().locale("de@collation=phonebook").build()) .sort(Sorts.ascending("first_name")).into(results); // Prints the JSON representation of the results if (results != null) { results.forEach(doc -> System.out.println(doc.toJson())); }
When we perform this operation on our example collection, the output should resemble the following:
{"_id": 3, "first_name": "Günter"} {"_id": 2, "first_name": "Gunter"} {"_id": 5, "first_name": "Hannah"} {"_id": 4, "first_name": "Jürgen"} {"_id": 1, "first_name": "Klara"}
For more information about the methods and classes mentioned in this section, see the following API Documentation:
findOneAndUpdate() Example
This section demonstrates how you can specify a collation in an
operation that updates the first match from your query. To specify the
collation for this operation, instantiate a FindOneAndUpdateOptions
object, set a collation on it, and pass it as a parameter to your call to
the findOneAndUpdate()
method.
In this example, we demonstrate the following:
Retrieve the first document in our example collection that precedes "Gunter" in an ascending order.
Set options for operation including the "de@collation=phonebook" collation.
Add a new field "verified" with the value "true".
Retrieve and print the updated document.
Note
The following code example uses imports from the
import com.mongodb.client.model
package for convenience.
Document result = collection.findOneAndUpdate( Filters.gt("first_name", "Gunter"), Updates.set("verified", true), new FindOneAndUpdateOptions() .collation(Collation.builder().locale("de@collation=phonebook").build()) .sort(Sorts.ascending("first_name")) .returnDocument(ReturnDocument.AFTER)); // Prints the JSON representation of the updated document if an update occurred if (result != null) { System.out.println("Updated document: " + result.toJson()); }
Since "Günter" is lexically before "Gunter" using the
de@collation=phonebook
collation in ascending order, the preceding operation
returns the following update document:
{ lastErrorObject: { updatedExisting: true, n: 1 }, value: { _id: 3, first_name: 'Günter' }, ok: 1 }
For more information about the methods and classes mentioned in this section, see the following API Documentation:
findOneAndDelete() Example
This section demonstrates how you can specify a numerical ordering of
strings in a collation in an operation that deletes the first match from your
query. To specify the collation for this operation, instantiate
a FindOneAndDeleteOptions
object, set a numeric ordering collation on
it, and pass it as a parameter to your call to the findOneAndDelete()
method.
This example calls the findOneAndDelete()
operation on a collection that
contains the following documents:
{ "_id" : 1, "a" : "16 apples" } { "_id" : 2, "a" : "84 oranges" } { "_id" : 3, "a" : "179 bananas" }
In the collation, we set the locale
option to "en" and the
numericOrdering
option to "true" in order to sort strings based on their
numerical order.
Note
The following code example uses imports from the
import com.mongodb.client.model
package for convenience.
Document result = collection.findOneAndDelete( Filters.gt("a", "100"), new FindOneAndDeleteOptions() .collation( Collation.builder() .locale("en") .numericOrdering(true) .build()) .sort(Sorts.ascending("a"))); // Prints the JSON representation of the deleted document if (result != null) { System.out.println("Deleted document: " + result.toJson()); }
After you run the preceding operation, your output should resemble the following:
Deleted document: {"_id": 3, "a": "179 bananas"}
The numeric value of the string "179" is greater than the number 100, so the preceding document is the only match.
If we perform the same operation without the numerical ordering collation on the original collection of three documents, the filter matches all of our documents since "100" comes before "16", "84", and "179" when ordering by binary collation.
For more information about the methods and classes mentioned in this section, see the following API Documentation:
Aggregation Example
This section demonstrates how you can specify a collation in an aggregation
operation. In an aggregation operation, you can specify a series of
aggregation stages which is collectively called the aggregation pipeline. To
perform an aggregation, call the aggregate()
method on a
MongoCollection
object.
To specify a collation for an aggregation operation, call the collation()
method on the AggregateIterable
returned by the aggregation operation.
Make sure to specify a sort aggregation stage on which to apply the
collation in your aggregation pipeline.
The following example shows how we can construct an aggregation pipeline on the example collection and apply a collation by specifying the following:
A group aggregation stage using the
Aggregates.group()
helper to identify each document by thefirst_name
field and use that value as the_id
of the result.An accumulator in the group aggregation stage to sum the number of instances of matching values in the
first_name
field.Apply an ascending sort to the
_id
field of the output documents of the prior aggregation stage.Construct a collation object, specifying the German locale and a collation strength that ignores accents and umlauts.
Bson groupStage = Aggregates.group("$first_name", Accumulators.sum("nameCount", 1)); Bson sortStage = Aggregates.sort(Sorts.ascending("_id")); AggregateIterable<Document> results = collection // Runs the aggregation pipeline that includes tallying "first_name" frequencies .aggregate(Arrays.asList(groupStage, sortStage)) // Applies a collation to sort documents alphabetically by using the German locale, ignoring accents .collation(Collation.builder().locale("de").collationStrength(CollationStrength.PRIMARY).build()); // Prints the JSON representation of the results if (results != null) { results.forEach(doc -> System.out.println(doc.toJson())); }
The preceding code outputs the following documents:
{"_id": "Gunter", "nameCount": 2} {"_id": "Hannah", "nameCount": 1} {"_id": "Jürgen", "nameCount": 1} {"_id": "Klara", "nameCount": 1}
For more information about the methods and classes mentioned in this section, see the following API Documentation: