Performance Best Practices: Indexing
Welcome to the third in our series of blog posts covering performance best practices for MongoDB.
In this series, we are covering key considerations for achieving performance at scale across a number of important dimensions, including:
- Data modeling and sizing memory (the working set)
- Query patterns and profiling
- Indexing, which we’ll cover today
- Sharding
- Transactions and read/write concerns
- Hardware and OS configuration
- Benchmarking
Having both worked for a couple of different database vendors over the past 15 years, we can safely say that failing to define the appropriate indexes is the number one performance issue technical support teams have to address with users.
So we need to get it right…..here are the best practices to help you.
Indexes in MongoDB
In any database, indexes support the efficient execution of queries. Without them, the database must scan every document in a collection or table to select those that match the query statement. If an appropriate index exists for a query, the database can use the index to limit the number of documents it must inspect.
MongoDB offers a broad range of index types and features with language-specific sort orders to support complex access patterns to your data. MongoDB indexes can be created and dropped on-demand to accommodate evolving application requirements and query patterns and can be declared on any field within your documents, including fields nested within arrays.
So let's cover how you make the best use of indexes in MongoDB.
Use Compound Indexes
Compound indexes are indexes composed of several different fields. For example, instead of having one index on "Last name" and another on "First name", it is typically most efficient to create an index that includes both "Last name" and "First name" if you query against both of the names. Our compound index can still be used to filter queries that specify the last name only.
Follow the ESR rule
For compound indexes, this rule of thumb is helpful in deciding the order of fields in the index:
- First, add those fields against which Equality queries are run.
- The next fields to be indexed should reflect the Sort order of the query.
- The last fields represent the Range of data to be accessed.
Use Covered Queries When Possible
Covered queries return results from an index directly without having to access the source documents, and are therefore very efficient.
For a query to be covered all the fields needed for filtering, sorting and/or being returned to the client must be present in an index. To determine whether a query is a covered query, use the explain()
method. If the explain()
output displays totalDocsExamined
as 0, this shows the query is covered by an index. Read more in the documentation for explain results.
A common gotcha when trying to achieve covered queries is that the _id field is always returned by default. You need to explicitly exclude it from query results, or add it to the index.
In sharded clusters, MongoDB internally needs to access the fields of the shard key. This means covered queries are only possible when the shard key is part of the index. It is usually a good idea to do this anyway.
Use Caution When Considering Indexes on Low-Cardinality Fields
Queries on fields with a small number of unique values (low cardinality) can return large result sets. Compound indexes may include fields with low cardinality, but the value of the combined fields should exhibit high cardinality.
Eliminate Unnecessary Indexes
Indexes are resource-intensive: even with compression in the MongoDB WiredTiger storage engine, they consume RAM and disk. As fields are updated, associated indexes must be maintained, incurring additional CPU and disk I/O overhead.
MongoDB provides tooling to help you understand index usage, which we will cover later in this post.
Wildcard Indexes Are Not a Replacement for Workload-Based Index Planning
For workloads with many ad-hoc query patterns or that handle highly polymorphic document structures, wildcard indexes give you a lot of extra flexibility. You can define a filter that automatically indexes all matching fields, subdocuments, and arrays in a collection.
As with any index, they also need to be stored and maintained, so they will add overhead to the database. If your application’s query patterns are known in advance, then you should use more selective indexes on the specific fields accessed by the queries.
Use text search to match words inside a field
Regular indexes are useful for matching the entire value of a field. If you only want to match on a specific word in a field with a lot of text, then use a text index.
If you are running MongoDB in the Atlas service, consider using Atlas Full Text Search which provides a fully-managed Lucene index integrated with the MongoDB database. FTS provides higher performance and greater flexibility to filter, rank, and sort through your database to quickly surface the most relevant results to your users.
Use Partial Indexes
Reduce the size and performance overhead of indexes by only including documents that will be accessed through the index. For example, create a partial index on the orderID
field that only includes order documents with an orderStatus
of "In progress", or only indexes the emailAddress
field for documents where it exists.
Take Advantage of Multi-Key Indexes for Querying Arrays
If your query patterns require accessing individual array elements, use a multi-key index. MongoDB creates an index key for each element in the array and can be constructed over arrays that hold both scalar values and nested documents.
Avoid Regular Expressions That Are Not Left Anchored or Rooted
Indexes are ordered by value. Leading wildcards are inefficient and may result in full index scans. Trailing wildcards can be efficient if there are sufficient case-sensitive leading characters in the expression.
Avoid Case Insensitive Regular Expressions
If the sole reason for using a regex is case insensitivity, use a case insensitive index instead, as those are faster.
Use Index Optimizations Available in the WiredTiger Storage Engine
If you are self-managing MongoDB, you can optionally place indexes on their own separate volume, allowing for faster disk paging and lower contention. See wiredTiger options for more information.
Use the Explain Plan
We covered the use of MongoDB’s explain plan in the previous query patterns and profiling post, and this is the best tool to check on index coverage for individual queries.
Working from the explain plan, MongoDB provides visualization tools to help further improve understanding of your indexes, and which provides intelligent and automatic recommendations on which indexes to add.
Visualize Index Coverage With MongoDB Compass and Atlas Data Explorer
As the free GUI for MongoDB, Compass provides many features to help you optimize query performance, including exploring your schema and visualizing query explain plans – two areas covered previously in this series.
The indexes tab in Compass adds another tool to your arsenal. It lists the existing indexes for a collection, reporting the name and keys of the index, along with its type, size, and any special properties. Through the index tab you can also add and drop indexes as needed.
A really useful feature is index usage, which shows you how often an index has been used. Having too many indexes can be almost as damaging to your performance as having too few, making this feature is especially valuable in helping you identify and remove indexes that are not being used. This helps you free working set space, and eliminates database overhead that comes from maintaining the index.
If you are running MongoDB in our fully-managed Atlas service, the indexes view in the Data Explorer will give you the same functionality as Compass, without you having to connect to your database with a separate tool.
You can also retrieve index statistics using the $indexStats
aggregation pipeline stage.
Automated Index Recommendations
Even with all of the telemetry provided by MongoDB’s tools, you are still responsible for pulling and analyzing the required data to make decisions on which indexes to add.
The threshold for slow queries varies based on the average time of operations on your cluster to provide recommendations pertinent to your workload.
Recommended indexes are accompanied by sample queries, grouped by query shape (i.e., queries with a similar predicate structure, sort, and projection), that were run against a collection that would benefit from the addition of a suggested index. The Performance Advisor does not negatively affect the performance of your Atlas clusters.
If you are happy with the recommendation, you can then roll out the new indexes automatically, without incurring any application downtime.
What’s Next
That wraps up this latest installment of the performance best practices series. MongoDB University offers a no-cost, web-based training course on MongoDB performance. This is a great way to learn more about the power of indexing.
Next up in this series: sharding.