MongoDB Atlas Best Practices: Part 2

Mat Keep
August 10, 2016 | Updated: November 28, 2025

Preparing for your MongoDB Deployment: Indexing, Data Migration & Instance Selection

MongoDB Atlas radically simplifies the operation of MongoDB. As with any hosted database as a service there are still decisions you need to take to ensure the best performance and availability for your application. This blog series provides a series of recommendations that will serve as a solid foundation for getting the most out of the MongoDB Atlas service.

We’ll cover four main areas over this series of blog posts:

In part 1, we got started by preparing for our deployment, focusing specifically on schema design and application access patterns.
In this part 2 post, we’ll discuss additional considerations as you prepare for your deployment, including indexing, data migration and instance selection.
In part 3, we’ll dive into how you scale your MongoDB Atlas deployment, and achieve your required availability SLAs.
In the final part 4, we’ll wrap up with best practices for operational management and ensuring data security.

If you want to get a head start and learn about all of these topics now, just go ahead and download the MongoDB Atlas Best Practices guide.

Indexing

Like most database management systems, indexes are a crucial mechanism for optimizing MongoDB query performance. While indexes will improve the performance of some operations by one or more orders of magnitude, they incur overhead to updates, disk space, and memory usage. Users should always create indexes to support queries, but should not maintain indexes that queries do not use. This is particularly important for deployments that support insert-heavy (or writes which modify indexed values) workloads.

To understand the effectiveness of the existing indexes being used, an $indexStats aggregation stage can be used to determine how frequently each index is used. This information can also be accessed through MongoDB Compass.

Query Optimization

Queries are automatically optimized by MongoDB to make evaluation of the query as efficient as possible. Evaluation normally includes the selection of data based on predicates, and the sorting of data based on the sort criteria provided. The query optimizer selects the best indexes to use by periodically running alternate query plans and selecting the index with the best performance for each query type. The results of this empirical test are stored as a cached query plan and periodically updated.

MongoDB provides an explain plan capability that shows information about how a query will be, or was, resolved, including:

The number of documents returned
The number of documents read
Which indexes were used
Whether the query was covered, meaning no documents needed to be read to return results
Whether an in-memory sort was performed, which indicates an index would be beneficial
The number of index entries scanned
How long the query took to resolve in milliseconds (when using the executionStats mode)
Which alternative query plans were rejected (when using the allPlansExecution mode)

The explain plan will show 0 milliseconds if the query was resolved in less than 1 ms, which is typical in well-tuned systems. When the explain plan is called, prior cached query plans are abandoned, and the process of testing multiple indexes is repeated to ensure the best possible plan is used. The query plan can be calculated and returned without first having to run the query. This enables DBAs to review which plan will be used to execute the query, without having to wait for the query to run to completion. The feedback from explain() will help you understand whether your query is performing optimally.

*Figure 1: MongoDB Compass visual explain plan*

MongoDB Compass also provides rich query plan visualizations to assist engineering teams to quickly access and optimize query execution.

Profiling

MongoDB provides a profiling capability called Database Profiler, which logs fine-grained information about database operations. The profiler can be enabled to log information for all events or only those events whose duration exceeds a configurable threshold (whose default is 100 ms). Profiling data is stored in a capped collection where it can easily be searched for relevant events. It may be easier to query this collection than parsing the log files.

Primary and Secondary Indexes

A unique index on the _id attribute is created for all documents. MongoDB will automatically create the _id field and assign a unique value if the value is not be specified when the document is inserted. All user-defined indexes are secondary indexes. MongoDB includes support for many types of secondary indexes that can be declared on any field(s) in the document, including fields within arrays and sub-documents. Index options include:

Compound indexes
Geospatial indexes
Text search indexes
Unique indexes
Array indexes
TTL indexes
Sparse indexes
Partial Indexes
Hash indexes

You can learn more about each of these indexes from the MongoDB Architecture Guide

Index Creation Options

Indexes and data are updated synchronously in MongoDB, thus ensuring queries on indexes never return stale or deleted data. The appropriate indexes should be determined as part of the schema design process. By default creating an index is a blocking operation in MongoDB. Because the creation of indexes can be time and resource intensive, MongoDB provides an option for creating new indexes as a background operation on both the primary and secondary members of a replica set. When the background option is enabled, the total time to create an index will be greater than if the index was created in the foreground, but it will still be possible to query the database while creating indexes.

In addition, multiple indexes can be built concurrently in the background. Refer to the Build Index on Replica Sets documentation to learn more about considerations for index creation and on-going maintenance.

Common Mistakes Regarding Indexes

The following tips may help to avoid some common mistakes regarding indexes:

Use a compound index rather than index intersection: For best performance when querying via multiple predicates, compound indexes will generally be a better option.
Compound indexes: Compound indexes are defined and ordered by field. So, if a compound index is defined for last name, first name and city, queries that specify last name or last name and first name will be able to use this index, but queries that try to search based on city will not be able to benefit from this index. Remove indexes that are prefixes of other indexes.
Low selectivity indexes: An index should radically reduce the set of possible documents to select from. For example, an index on a field that indicates gender is not as beneficial as an index on zip code, or even better, phone number.
Regular expressions: Indexes are ordered by value, hence leading wildcards are inefficient and may result in full index scans. Trailing wildcards can be efficient if there are sufficient case-sensitive leading characters in the expression.
Negation: Inequality queries can be inefficient with respect to indexes. Like most database systems, MongoDB does not index the absence of values and negation conditions may require scanning all documents. If negation is the only condition and it is not selective (for example, querying an orders table, where 99% of the orders are complete, to identify those that have not been fulfilled), all records will need to be scanned.
Eliminate unnecessary indexes: Indexes are resource-intensive: even with they consume RAM, and as fields are updated their associated indexes must be maintained, incurring additional disk I/O overhead. To understand the effectiveness of the existing indexes being used, an $indexStats aggregation stage can be used to determine how frequently each index is used. If there are indexes that are not used then removing them will reduce storage and speed up writes.

Working Sets

MongoDB makes extensive use of RAM to speed up database operations. In MongoDB, all data is read and manipulated through in-memory representations of the data. Reading data from memory is measured in nanoseconds and reading data from disk is measured in milliseconds, thus reading from memory is orders of magnitude faster than reading from disk.

The set of data and indexes that are accessed during normal operations is called the working set. It is best practice that the working set fits in RAM. It may be the case the working set represents a fraction of the entire database, such as in applications where data related to recent events or popular products is accessed most commonly.

When MongoDB attempts to access data that has not been loaded in RAM, it must be read from disk. If there is free memory then the operating system can locate the data on disk and load it into memory directly. However, if there is no free memory, MongoDB must write some other data from memory to disk, and then read the requested data into memory. This process can be time consuming and significantly slower than accessing data that is already resident in memory.

Some operations may inadvertently purge a large percentage of the working set from memory, which adversely affects performance. For example, a query that scans all documents in the database, where the database is larger than available RAM on the server, will cause documents to be read into memory and may lead to portions of the working set being written out to disk. Other examples include various maintenance operations such as compacting or repairing a database and rebuilding indexes.

If your database working set size exceeds the available RAM of your system, consider provisioning an instance with larger RAM capacity (scaling up) or sharding the database across additional instances (scaling out). Scaling is an automated, on-line operation which is launched by selecting the new configuration after clicking the CONFIGURE button in MongoDB Atlas (Figure 1). For a discussion on this topic, refer to the section on Sharding Best Practices in part 3 of the blog series. It is easier to implement sharding before the system’s resources are consumed, so capacity planning is an important element in successful project delivery.

*Figure 2: Reconfiguring the MongoDB Atlas Cluster*

Data Migration

Users should assess how best to model their data for their applications rather than simply importing the flat file exports of their legacy systems. In a traditional relational database environment, data tends to be moved between systems using delimited flat files such as CSV. While it is possible to ingest data into MongoDB from CSV files, this may in fact only be the first step in a data migration process. It is typically the case that MongoDB's document data model provides advantages and alternatives that do not exist in a relational data model.

There are many options to migrate data from flat files into rich JSON documents, including mongoimport, custom scripts, ETL tools and from within an application itself which can read from the existing RDBMS and then write a JSON version of the document back to MongoDB.

Other tools such as mongodump and mongorestore, or MongoDB Atlas backups are useful for moving data between different MongoDB systems. The use of mongodump and mongorestore to migrate an application and its data to MongoDB Atlas is described in the post – Migrating Data to MongoDB Atlas.

MongoDB Atlas Instance Selection

The following recommendations are only intended to provide high-level guidance for hardware for a MongoDB deployment. The specific configuration of your hardware will be dependent on your data, queries, performance SLA, and availability requirements.

Memory

As with most databases, MongoDB performs best when the working set (indexes and most frequently accessed data) fits in RAM. Sufficient RAM is the most important factor for instance selection; other optimizations may not significantly improve the performance of the system if there is insufficient RAM. When selecting which MongoDB Atlas instance size to use, opt for one that has sufficient RAM to hold the full working data set (or the appropriate subset if sharding).

If your working set exceeds the available RAM, consider using a larger instance type or adding additional shards to your system.

Storage

Using faster storage can increase database performance and latency consistency. Each node must be configured with sufficient storage for the full data set, or for the subset to be stored in a single shard. The storage speed and size can be set when picking the MongoDB Atlas instance during cluster creation or reconfiguration.

*Figure 3: Select instance size and storage size & speed*

Data volumes can optionally be encrypted which increases security at the expense of reduced performance.

CPU

MongoDB Atlas instances are multi-threaded and can take advantage of many CPU cores. Specifically, the total number of active threads (i.e., concurrent operations) relative to the number of CPUs can impact performance:

Throughput increases as the number of concurrent active operations increases up to and beyond the number of CPUs
Throughput eventually decreases as the number of concurrent active operations exceeds the number of CPUs by some threshold amount

The threshold amount depends on your application. You can determine the optimum number of concurrent active operations for your application by experimenting and measuring throughput.

The larger MongoDB Atlas instances include more virtual CPUs and so should be considered for highly concurrent workloads.

Next Steps

That’s a wrap for part 2 of the MongoDB Atlas best practices blog series. In Part 3, we’ll dive into scaling your MongoDB Atlas cluster, and achieving continuous availability

Download MongoDB Atlas Best Practices Guide

← Previous

Minority in Tech? Apply to the Diversity Scholarship for MongoDB Europe!

After personally getting to know our amazing Diversity Scholars at MongoDB World, I couldn’t wait to get started on our next opportunity to contribute to changing the ratio in technology. Today, I’m excited to announce that we’re now accepting applications for the Diversity Scholarship for MongoDB Europe! Award Scholarship recipients receive: Complimentary admission to MongoDB Europe Invitation to Diversity Scholars lunch Introduction to MongoDB speakers at the event A MongoDB certification voucher Three months of access to on-demand MongoDB University courses A feature in a blog post At the event, you’ll get to have lunch with your fellow Diversity Scholars and conference speakers. It’s a great way to get to know each other, make new connections, and share giant ideas! Apply Now To qualify you must be 18 years old or older, and belong to an underrepresented group in the technology industry. This includes women, people with disabilities, people from ethnic minority backgrounds, people from low-income backgrounds, and individuals from the LGBTQ community. The deadline to apply is Friday, October 7, 2016. I look forward to receiving your applications, and to seeing you at MongoDB Europe !

August 10, 2016

Next →

MongoDB.local San Francisco 2026: Ship Production AI, Faster

Today at MongoDB.local San Francisco, we announced capabilities that collapse the distance between AI prototype and production. Building AI applications means solving real problems: keeping conversational context clean and queryable, retrieving the right information from thousands of past interactions, connecting AI agents to your data without custom plumbing. These aren't theoretical challenges, they're the friction points that slow teams down every day. The AI era demands more from your data platform. MongoDB gives you everything you need to build quickly. Voyage AI: the best gets better Embedding models can make or break AI search experiences. We're proud that voyage-3-large has been the world's top-performing embedding model on Hugging Face's RTEB benchmark since its inception. But we didn’t rest on our laurels. There’s a new model at the top of the charts. Today, we're pleased to announce that the Voyage 4 model family is now generally available. The best just got better. The voyage-4 series models operate in a shared embedding space, allowing for cross-model compatibility and unprecedented flexibility to optimize for accuracy, speed, or cost. This release also includes voyage-4-nano, our first open-weight model available on HuggingFace, perfect for local development. Additionally, we're launching the new voyage-multimodal-3.5 model, which has been specifically trained to support video content alongside text and images. For developers building multimodal AI applications, this represents a significant leap forward in handling diverse content types within a single retrieval system. Best of all, upgrading is remarkably straightforward—you can simply change the model parameter to "voyage-multimodal-3.5" in your API call, instantly unlocking video capabilities without needing to refactor your existing codebase or change your application architecture. Finally, we’re announcing the public preview of the Embedding and Reranking API on MongoDB Atlas, providing API support for Voyage AI models. While enabling standalone usage of the models with any technology stack, the API benefits from the robust security and scalability standards of MongoDB. By bringing critical components into a single control plane and interface, it eliminates the need to manage separate vendors and significantly reduces operational overhead. Automated Embedding, convenience built into MongoDB Community Persistence matters. An AI with amnesia isn’t helpful; users need systems to remember context from minutes, hours, and weeks ago. Every interaction is a goldmine of preferences, patterns, and behavior that should make the next interaction smarter. But storing conversation history in a database isn't enough. Simple storage solves nothing if you can't retrieve the right information at the right time. The real challenge is intelligent retrieval: finding relevant context across thousands of past interactions, filtered by metadata and user attributes, without your system buckling under production load. This is where vector search becomes critical—enabling semantic search that captures meaning, not just keywords, while operating on your real-time operational data. And this is where MongoDB's approach eliminates a major pain point: the need to sync data between separate systems for vectors and application data. Until now, generating and storing these vectors required overhead—development time, infrastructure management, and cognitive load. No longer. We're introducing Automated Embedding for MongoDB Community Edition in public preview. MongoDB Community Edition now handles the complexity of managing embedding models automatically, giving developers high-accuracy semantic search in the database while maintaining flexibility to use any LLM provider or orchestration framework. Automated Embedding offers one-click automatic embedding directly inside MongoDB, which eliminates the need to sync data and manage external models. It’s an easy way to get high quality embedding natively. Best-in-class retrieval shouldn't require infrastructure work—Automated Embedding in MongoDB Vector Search delivers on that promise. Automated Embedding in MongoDB Vector Search is available now in Community Edition, with Atlas access coming soon. Precise text filtering for advanced search use cases Today, we announced the launch of Lexical Prefilters for Vector Search. This addresses a long-standing request from developers building semantic search interfaces who need advanced text filtering alongside vector operations. The new syntax enables powerful text filtering capabilities—fuzzy matching, phrase search, wildcards, and geospatial filtering—as prefilters for vector search. This leverages full text analysis capabilities while maintaining the semantic power of vector search. We've introduced a new vector data type in $search index definitions and a vectorSearch operator within the $search aggregation stage to make this work seamlessly. This replaces the knnBeta operator with a cleaner, more powerful approach. For teams already using lexical and vector search together, this provides a simplified migration path with significantly expanded capabilities. Intelligent assistance wherever you work MongoDB’s intelligent assistant is generally available in MongoDB Compass. The assistant provides in-app guidance for debugging connection errors, optimizing query performance, and learning best practices, all without leaving your development environment. You can even query your database using natural language through read-only database tools that require your approval before execution, allowing for deeper contextual awareness of your data. The assistant was built to address real friction: developers switching between multiple tools and documentation tabs, waiting for support responses, or getting generic advice from general-purpose AI chatbots that don't understand MongoDB-specific contexts. Now, tailored guidance is available instantly, right where you're working. The modernized Atlas Data Explorer interface brings the Compass experience directly into the Atlas web UI, addressing a critical gap for teams with security policies that restrict desktop application usage. Users can now perform sophisticated query development, optimization, bulk operations, and complex aggregations—all with AI assistance—across all MongoDB Atlas clusters in a unified web interface. Whether you're troubleshooting a connection issue, optimizing a slow query, or learning how to structure an aggregation pipeline, the intelligent assistant delivers MongoDB-specific expertise without context switching. Try the intelligent assistant in the modernized Atlas Data Explorer now. The engine behind MongoDB Search and Vector Search is now available under SSPL Finally, mongot, the engine powering MongoDB Search and Vector Search, is now publicly available under SSPL. While still in preview, after years of development and investment, we're making the source code of this core technology available to the community, expanding our unified search architecture beyond Atlas to every MongoDB deployment. mongot runs separately from mongod, MongoDB's core database process, and is the foundation that makes powerful search native to MongoDB. Releasing mongot under SSPL means full transparency for security audits and debugging complex edge cases. Developers can dive into mongot's architecture, understand how search and vector operations work under the hood, and help shape the future of search at MongoDB. A modern data platform that evolves with your needs These announcements reflect our commitment to anticipating what developers need as AI development matures. Vector search, time series, stream processing, queryable encryption, Atlas itself—we've consistently delivered on emerging requirements. "If you're building an early-stage company that is going to scale very rapidly, you need a database solution that isn't going to break under the load of a huge volume of users," said Eno Reyes, Co-founder and CTO of Factory. "You need a fast-moving team with a reliable solution, and there really is one option in this space—and it's MongoDB." Rabi Shanker Guha, CEO of Thesys, put it this way: “MongoDB helps us move fast in an ever-changing world. The best database is the one you don’t have to think about—it just works exactly where and how you need it. That’s MongoDB for us.” Ship faster, scale confidently Each capability we announced today addresses real friction in the AI development workflow and in the developer experience. We're not asking developers to choose between structured data and vectors, between performance and flexibility, or between rapid iteration and production readiness. The promise is straightforward: ship faster, scale confidently, and focus on what makes your AI application unique—not on managing database infrastructure. In an ecosystem crowded with point solutions and retrofitted legacy systems, MongoDB is a modern data platform built for the long haul.

January 15, 2026