Improve Vector Search Performance

On this page

Reduce Vector Dimensions

Avoid Indexing Vectors When Running Queries
Pre-Filter Data
Use Dedicated Search Nodes
Exclude Vector Fields From the Results
Ensure Enough Memory
Warm up the Filesystem Cache
Use binData Vectors
Quantize the Vector Embeddings

Atlas Vector Search enables you to perform ANN queries that search for results similar to a selected product, search for images, and so on. To improve the indexing speed and query performance, review the following best practices.

Reduce Vector Dimensions

Atlas Vector Search supports up to 4096, inclusive, vector dimensions. However, vector search indexing and queries are computationally intensive, as larger vectors require more floating point comparisons. Therefore, where possible, we recommend reducing the number of dimensions after ensuring that you can measure the impact of changing embedding models on the accuracy of your vector queries.

Avoid Indexing Vectors When Running Queries

Vector embeddings consume computational resources during indexing. We recommend avoiding indexing and re-indexing during a vector search. If you decide to change the embedding model that produces the vectors to index, we recommend that you re-index the new vectors into a new index rather than updating the index that is currently in use.

Pre-Filter Data

If you have more vectors or vectors with higher dimensions, you can narrow the scope of your semantic search and ensure that not all vectors are considered for comparison. We recommend including the filter option inside your $vectorSearch pipeline, which performs pre-filtering to reduce the number of documents to perform the vector search on. Also, consider the performance impact of very high-dimensional vectors, as query performance could degrade with large arrays.

Use Dedicated Search Nodes

If you deploy the mongod and mongot processes on the same node, there might be resource contention between the processes. To optimize the performance of your Atlas Vector Search queries, we recommend that you deploy the mongot process on dedicated Search Nodes. This not only helps avoid resource contention between the mongot and mongod processes, but also enables parallel segment search by default for $vectorSearch queries on Search Nodes.

Exclude Vector Fields From the Results

You can request existing fields from the documents in the results and newly computed fields to be returned in the $project stage. To improve query performance, use the $project stage to judiciously select the fields to return in the results, unless you need all the fields in the results. We recommend excluding the vector field in the $project stage because vector embeddings might be large and impact query latency in returning the results.

Ensure Enough Memory

Hierarchical Navigable Small Worlds works efficiently when vector data is held in memory. You must ensure that the data nodes have enough RAM to hold the vector data and indexes. We recommend deploying separate Search Nodes for workload isolation without data isolation, which enables more efficient usage of memory for vector search use cases.

Warm up the Filesystem Cache

When you perform vector search, your queries initially perform random seeks on disk as you traverse the Hierarchical Navigable Small Worlds graph and the vector values are read into memory. This causes very high latency for initial queries. The latency improves when Hierarchical Navigable Small Worlds traversal reads all indexed vectors into memory, which allows them to be accessed much more quickly for subsequent queries.

However, this cache warming process must be repeated on large writes, or when your index is rebuilt.

Use `binData` Vectors

The BinData vector subtypes provide 3x storage savings when using float vectors in mongod, and also support indexing vectors with alternative types such as int8 vectors and int1 vectors. This significantly reduced resource profile accelerates the internal query path that mongod uses to retrieve documents from the database for every $vectorSearch query. Using binData vectors, even when you are using binData floats, materially accelerates query latency especially when the limit (number of results) is greater than 20.

Quantize the Vector Embeddings

Scalar quantization reduces the precision of each individual dimension such as converting 32-bit floating-point numbers to 8-bit integers. However, it retains the ability to retrieve relevant information well for most embedding models. On the other hand, binary quantization reduces the vectors to either 1 or 0, which performs better for QAT embedding models.

Scalar quantization is good at preserving recall for vectors from most embedding models. If your have vectors from QAT embedding models, binary quantization can provide better performance because the training process trains the model to adapt to the extreme reduction in precision.

Back

Evaluate Results

Troubleshooting