Specify the Default Language for a Text Index on Self-Managed Deployments
On this page
By default, the default_language
for text indexes is english
.
To improve the performance of non-English text search queries, you can specify
a different default language associated with your text index.
The default language associated with the indexed data determines the suffix
stemming rules. The default language also determines which language-specific
stop words (for example, the
, an
, a
, and and
in English) are
not indexed.
To specify a different language, use the default_language
option when
creating the text index. To see the languages available for text indexing, see
Text Search Languages on Self-Managed Deployments. Your operation should resemble this prototype:
db.<collection>.createIndex( { <field>: "text" }, { default_language: <language> } )
If you specify a default_language
value of none
, the text index
parses through each word in the field, including stop words, and ignores
suffix stemming.
Before You Begin
Create a quotes
collection that contains the following documents
with a Spanish text field:
db.quotes.insertMany( [ { _id: 1, quote : "La suerte protege a los audaces." }, { _id: 2, quote: "Nada hay más surrealista que la realidad." }, { _id: 3, quote: "Es este un puñal que veo delante de mí?" }, { _id: 4, quote: "Nunca dejes que la realidad te estropee una buena historia." } ] )
Procedure
The following operation creates a text index on the quote
field and sets
the default_language
to spanish
:
db.quotes.createIndex( { quote: "text" }, { default_language: "spanish" } )
Results
The resulting index supports text search queries on the quote
field with
Spanish-language suffix stemming rules. For example, the following
query searches for the keyword punal
in the quote
field:
db.quotes.find( { $text: { $search: "punal" } } )
Output:
[ { _id: 3, quote: "Es este un puñal que veo delante de mí?" } ]
Although the $search
value is set to punal
, the query will return the
document containing the word puñal
because text indexes are diacritic
insensitive.
The index also ignores language-specific stop words. For example, although the
document with _id: 2
contains the word hay
, the following query does not
return any documents. hay
is classified as a Spanish stop word, meaning it
is not included in the text index.
db.quotes.find( { $text: { $search: "hay" } } )
Learn More
To create a text index for a collection containing text in multiple languages, see Create a Text Index for a Collection Containing Multiple Languages on Self-Managed Deployments.
To learn about other text index properties, see Text Index Properties on Self-Managed Deployments.