Specify the Default Language for a Text Index on Self-Managed Deployments
On this page
This tutorial describes how to specify the default language associated with the text index and also how to create text indexes for collections that contain documents in different languages.
Specify the Default Language for a text
Index
The default language associated with the indexed data determines the
rules to parse word roots (i.e. stemming) and ignore stop words. The
default language for the indexed data is english
.
To specify a different language, use the default_language
option
when creating the text
index. See Text Search Languages on Self-Managed Deployments for
the languages available for default_language
.
The following example creates for the quotes
collection a text
index on the content
field and sets the default_language
to
spanish
:
db.quotes.createIndex( { content : "text" }, { default_language: "spanish" } )
Create a text
Index for a Collection in Multiple Languages
Specify the Index Language within the Document
If a collection contains documents or embedded documents that are in
different languages, include a field named language
in the
documents or embedded documents and specify as its value the language for
that document or embedded document.
MongoDB will use the specified language for that document or
embedded document when building the text
index:
The specified language in the document overrides the default language for the
text
index.The specified language in an embedded document override the language specified in an enclosing document or the default language for the index.
See Text Search Languages on Self-Managed Deployments for a list of supported languages.
For example, a collection quotes
contains multi-language documents
that include the language
field in the document and/or the
embedded document as needed:
{ _id: 1, language: "portuguese", original: "A sorte protege os audazes.", translation: [ { language: "english", quote: "Fortune favors the bold." }, { language: "spanish", quote: "La suerte protege a los audaces." } ] } { _id: 2, language: "spanish", original: "Nada hay más surrealista que la realidad.", translation: [ { language: "english", quote: "There is nothing more surreal than reality." }, { language: "french", quote: "Il n'y a rien de plus surréaliste que la réalité." } ] } { _id: 3, original: "is this a dagger which I see before me.", translation: { language: "spanish", quote: "Es este un puñal que veo delante de mí." } }
If you create a text
index on the quote
field with the default
language of English.
db.quotes.createIndex( { original: "text", "translation.quote": "text" } )
Then, for the documents and embedded documents that contain the language
field, the text
index uses that language to parse word stems and
other linguistic characteristics.
For embedded documents that do not contain the language
field,
If the enclosing document contains the
language
field, then the index uses the document's language for the embedded document.Otherwise, the index uses the default language for the embedded documents.
For documents that do not contain the language
field, the index
uses the default language, which is English.
Use any Field to Specify the Language for a Document
To use a field with a name other than language
, include
the language_override
option when creating the index.
For example, give the following command to use idioma
as the field
name instead of language
:
db.quotes.createIndex( { quote : "text" }, { language_override: "idioma" } )
The documents of the quotes
collection may specify a language with
the idioma
field:
{ _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" } { _id: 2, idioma: "spanish", quote: "Nada hay más surrealista que la realidad." } { _id: 3, idioma: "english", quote: "is this a dagger which I see before me" }