Case-Insensitive Queries Without Case-Insensitive Indexes
Lauren Schaefer, Daniel Coupal8 min read • Published Feb 12, 2022 • Updated Oct 01, 2024
Rate this article
We've reached the sixth and final (at least for now) MongoDB schema design anti-pattern. In the first five posts in this series, we've covered the following anti-patterns.
Today, we'll explore the wonderful world of case-insensitive indexes. Not having a case-insensitive index can create surprising query results and/or slow queries...and make you hate everything.
Once you know the details of how case-insensitive queries work, the implementation is fairly simple. Let's dive in!
Check out the video above to see the case-insensitive queries and indexes in action.
MongoDB supports three primary ways to run case-insensitive queries.
First, you can run a case-insensitive query using $regex with the
i
option. These queries will give you the expected case-insensitive results. However, queries that use $regex
cannot efficiently utilize case-insensitive indexes, so these queries can be very slow depending on how much data is in your collection.Second, you can run a case-insensitive query by creating a case-insensitive index (meaning it has a collation strength of
1
or 2
) and running a query with the same collation as the index. A collation defines the language-specific rules that MongoDB will use for string comparison. Indexes can optionally have a collation with a strength that ranges from 1 to 5. Collation strengths of 1
and 2
both give you case-insensitivity. For more information on the differences in collation strengths, see the MongoDB docs. A query that is run with the same collation as a case-insensitive index will return case-insensitive results. Since these queries are covered by indexes, they execute very quickly.Third, you can run a case-insensitive query by setting the default collation strength for queries and indexes to a strength of
1
or 2
when you create a collection. All queries and indexes in a collection automatically use the default collation unless you specify otherwise when you execute a query or create an index. Therefore, when you set the default collation to a strength of 1
or 2
, you'll get case-insensitive queries and indexes by default. See the collation
option in the db.createCollection() section of the MongoDB Docs for more details.Warning for queries that do not use
$regex
: Your index must have a collation strength of 1
or 2
and your query must use the same collation as the index in order for your query to be case-insensitive.You can use MongoDB Compass (MongoDB's desktop GUI) or the MongoDB Shell (MongoDB's command-line tool) to test if a query is returning the results you'd expect, see its execution time, and determine if it's using an index.
Let's revisit the example we saw in the Unnecessary Indexes Anti-Pattern and the Bloated Documents Anti-Pattern posts. Leslie is creating a website that features inspirational women. She has created a database with information about 4,700+ inspirational women. Below are three documents in her
InspirationalWomen
collection.1 { 2 "_id": ObjectId("5ef20c5c7ff4160ed48d8f83"), 3 "first_name": "Harriet", 4 "last_name": "Tubman", 5 "quote": "I was the conductor of the Underground Railroad for eight years, 6 and I can say what most conductors can't say; I never ran my 7 train off the track and I never lost a passenger" 8 }, 9 { 10 "_id": ObjectId("5ef20c797ff4160ed48d90ea"), 11 "first_name": "HARRIET", 12 "middle_name": "BEECHER", 13 "last_name": "STOWE", 14 "quote": "When you get into a tight place and everything goes against you, 15 till it seems as though you could not hang on a minute longer, 16 never give up then, for that is just the place and time that 17 the tide will turn." 18 }, 19 { 20 "_id": ObjectId("5ef20c937ff4160ed48d9201"), 21 "first_name": "Bella", 22 "last_name": "Abzug", 23 "quote": "This woman's place is in the House—the House of Representatives." 24 }
Leslie decides to add a search feature to her website since the website is currently difficult to navigate. She begins implementing her search feature by creating an index on the
first_name
field. Then she starts testing a query that will search for women named "Harriet."Leslie executes the following query in the MongoDB Shell:
1 db.InspirationalWomen.find({first_name: "Harriet"})
She is surprised to only get one document returned since she has two Harriets in her database: Harriet Tubman and Harriet Beecher Stowe. She realizes that Harriet Beecher Stowe's name was input in all uppercase in her database. Her query is case-sensitive, because it is not using a case-insensitive index.
1 db.InspirationalWomen.find({first_name: "Harriet"}).explain("executionStats")
The Shell returns the following output.
1 { 2 "queryPlanner": { 3 ... 4 "winningPlan": { 5 "stage": "FETCH", 6 "inputStage": { 7 "stage": "IXSCAN", 8 "keyPattern": { 9 "first_name": 1 10 }, 11 "indexName": "first_name_1", 12 ... 13 "indexBounds": { 14 "first_name": [ 15 "[\"Harriet\", \"Harriet\"]" 16 ] 17 } 18 } 19 }, 20 "rejectedPlans": [] 21 }, 22 "executionStats": { 23 "executionSuccess": true, 24 "nReturned": 1, 25 "executionTimeMillis": 0, 26 "totalKeysExamined": 1, 27 "totalDocsExamined": 1, 28 "executionStages": { 29 ... 30 } 31 } 32 }, 33 ... 34 }
She can see that the
winningPlan
is using an IXSCAN
(index scan) with her first_name_1
index. In the executionStats
, she can see that only one index key was examined (executionStats.totalKeysExamined
) and only one document was examined (executionStats.totalDocsExamined
). For more information on how to interpret the output from .explain()
, see Analyze Query Performance.Leslie opens Compass and sees similar results.
first_name_1
index.Leslie wants all Harriets—regardless of what lettercase is used—to be returned in her query. She updates her query to use
$regex
with option i
to indicate the regular expression should be case-insensitive. She returns to the Shell and runs her new query:1 db.InspirationalWomen.find({first_name: { $regex: /Harriet/i} })
This time she gets the results she expects: documents for both Harriet Tubman and Harriet Beecher Stowe. Leslie is thrilled! She runs the query again with
.explain("executionStats")
to get details on her query execution. Below is what the Shell returns:1 { 2 "queryPlanner": { 3 ... 4 "winningPlan": { 5 "stage": "FETCH", 6 "inputStage": { 7 "stage": "IXSCAN", 8 "filter": { 9 "first_name": { 10 "$regex": "Harriet", 11 "$options": "i" 12 } 13 }, 14 "keyPattern": { 15 "first_name": 1 16 }, 17 "indexName": "first_name_1", 18 ... 19 "indexBounds": { 20 "first_name": [ 21 "[\"\", {})", 22 "[/Harriet/i, /Harriet/i]" 23 ] 24 } 25 } 26 }, 27 "rejectedPlans": [] 28 }, 29 "executionStats": { 30 "executionSuccess": true, 31 "nReturned": 2, 32 "executionTimeMillis": 3, 33 "totalKeysExamined": 4704, 34 "totalDocsExamined": 2, 35 "executionStages": { 36 ... 37 } 38 }, 39 ... 40 }
She can see that this query, like her previous one, uses an index (
IXSCAN
). However, since $regex
queries cannot efficiently utilize case-insensitive indexes, she isn't getting the typical benefits of a query that is covered by an index. All 4,704 index keys (executionStats.totalKeysExamined
) are being examined as part of this query, resulting in a slightly slower query (executionStats.executionTimeMillis: 3
) than one that fully utilizes an index.She runs the same query in Compass and sees similar results. The query is using her
first_name_1
index but examining every index key.$regex
query is using the first_name_1
index but examining every index key.Leslie wants to ensure that her search feature runs as quickly as possible. She uses Compass to create a new case-insensitive index named
first_name-case_insensitive
. (She can easily create indexes using other tools as well like the Shell or MongoDB Atlas or even programmatically.) Her index will be on the first_name
field in ascending order and use a custom collation with a locale of en
and a strength of 2
. Recall from the previous section that the collation strength must be set to 1
or 2
in order for the index to be case-insensitive.en
and a strength of 2
.Leslie runs a query very similar to her original query in the Shell, but this time she specifies the collation that matches her newly-created index:
1 db.InspirationalWomen.find({first_name: "Harriet"}).collation( { locale: 'en', strength: 2 } )
This time she gets both Harriet Tubman and Harriet Beecher Stowe. Success!
She runs the query with
.explain("executionStats")
to double check that the query is using her index:1 db.InspirationalWomen.find({first_name: "Harriet"}).collation( { locale: 'en', strength: 2 } ).explain("executionStats")
The Shell returns the following results.
1 { 2 "queryPlanner": { 3 ... 4 "collation": { 5 "locale": "en", 6 ... 7 "strength": 2, 8 ... 9 }, 10 "winningPlan": { 11 "stage": "FETCH", 12 "inputStage": { 13 "stage": "IXSCAN", 14 "keyPattern": { 15 "first_name": 1 16 }, 17 "indexName": "first_name-case_insensitive", 18 "collation": { 19 "locale": "en", 20 ... 21 "strength": 2, 22 ... 23 }, 24 ... 25 "indexBounds": { 26 "first_name": [ 27 "[\"7)KK91O\u0001\u000b\", \"7)KK91O\u0001\u000b\"]" 28 ] 29 } 30 } 31 }, 32 "rejectedPlans": [] 33 }, 34 "executionStats": { 35 "executionSuccess": true, 36 "nReturned": 2, 37 "executionTimeMillis": 0, 38 "totalKeysExamined": 2, 39 "totalDocsExamined": 2, 40 "executionStages": { 41 ... 42 } 43 } 44 }, 45 ... 46 }
Leslie can see that the winning plan is executing an
IXSCAN
(index scan) that uses the case-insensitive index she just created. Two index keys (executionStats.totalKeysExamined
) are being examined, and two documents (executionStats.totalDocsExamined
) are being examined. The query is executing in 0 ms (executionStats.executionTimeMillis: 0
). Now that's fast!Leslie runs the same query in Compass and specifies the collation the query should use.
She can see that the query is using her case-insensitive index and the
query is executing in 0 ms. She's ready to implement her search feature.
Time to celebrate!
Note: Another option for Leslie would have been to set the default collation strength of her InspirationalWomen collection to
1
or 2
when she created her collection. Then all of her queries would have returned the expected, case-insensitive results, regardless of whether she had created an index or not. She would still want to create indexes to increase the performance of her queries.You have three primary options when you want to run a case-insensitive query:
- Use
$regex
with thei
option. Note that this option is not as performant because$regex
cannot fully utilize case-insensitive indexes. - Create a case-insensitive index with a collation strength of
1
or2
, and specify that your query uses the same collation. - Set the default collation strength of your collection to
1
or2
when you create it, and do not specify a different collation in your queries and indexes.
This post is the final anti-pattern we'll cover in this series. But, don't be too sad—this is not the final post in this series. Be on the lookout for the next post where we'll summarize all of the anti-patterns and show you a brand new feature in MongoDB Atlas that will help you discover anti-patterns in your database. You won't want to miss it!
When you're ready to build a schema in MongoDB, check out MongoDB Atlas, MongoDB's fully managed database-as-a-service. Atlas is the easiest way to get started with MongoDB and has a generous, forever-free tier.
Check out the following resources for more information:
This is part of a series