Improve Vector Search Performance
On this page
Atlas Vector Search enables you to perform ANN queries that search for results similar to a selected product, search for images, and so on. To improve the indexing speed and query performance, review the following best practices.
Reduce Vector Dimensions
Atlas Vector Search supports up to 4096
, inclusive, vector dimensions. However,
vector search indexing and queries are computationally intensive, as
larger vectors require more floating point comparisons. Therefore, where
possible, we recommend reducing the number of dimensions after ensuring
that you can measure the impact of changing embedding models on the
accuracy of your vector queries.
Avoid Indexing Vectors When Running Queries
Vector embeddings consume computational resources during indexing. We recommend avoiding indexing and re-indexing during a vector search. If you decide to change the embedding model that produces the vectors to index, we recommend that you re-index the new vectors into a new index rather than updating the index that is currently in use.
Pre-Filter Data
If you have more vectors or vectors with higher dimensions, you can
narrow the scope of your semantic search and ensure that not all vectors
are considered for comparison. We recommend including the filter
option inside your $vectorSearch
pipeline, which performs
pre-filtering to reduce the number of documents to perform the vector
search on. Also, consider the performance impact of very
high-dimensional vectors, as query performance could degrade with large
arrays.
Use Dedicated Search Nodes
If you deploy the mongod
and mongot
processes on the same
node, there might be resource contention between the processes. To
optimize the performance of your Atlas Vector Search queries, we recommend that
you deploy the mongot
process on dedicated Search Nodes. This not only helps avoid resource contention
between the mongot
and mongod
processes, but also enables
parallel segment search by
default for $vectorSearch
queries on Search Nodes.
Exclude Vector Fields From the Results
You can request existing fields from the documents in the results
and newly computed fields to be returned in the $project
stage. To improve query performance, use the $project
stage
to judiciously select the fields to return in the results, unless you
need all the fields in the results. We recommend excluding the vector
field in the $project
stage because vector embeddings might
be large and impact query latency in returning the results.
Ensure Enough Memory
Hierarchical Navigable Small Worlds works efficiently when vector data is held in memory. You must ensure that the data nodes have enough RAM to hold the vector data and indexes. We recommend deploying separate Search Nodes for workload isolation without data isolation, which enables more efficient usage of memory for vector search use cases.
Warm up the Filesystem Cache
When you perform vector search, your queries initially perform random seeks on disk as you traverse the Hierarchical Navigable Small Worlds graph and the vector values are read into memory. This causes very high latency for initial queries. The latency improves when Hierarchical Navigable Small Worlds traversal reads all indexed vectors into memory, which allows them to be accessed much more quickly for subsequent queries.
However, this cache warming process must be repeated on large writes, or when your index is rebuilt.
Use binData
Vectors
The BinData vector subtypes
provide 3x storage savings when using float vectors in mongod
, and
also support indexing vectors with alternative types such as int8
vectors and int1
vectors. This significantly reduced resource
profile accelerates the internal query path that mongod
uses to
retrieve documents from the database for every $vectorSearch
query. Using binData
vectors, even when you are using binData
floats, materially accelerates query latency especially when the
limit
(number of results) is greater than 20.
Quantize the Vector Embeddings
Scalar quantization reduces the precision of each individual dimension
such as converting 32-bit floating-point numbers to 8-bit integers.
However, it retains the ability to retrieve relevant information well
for most embedding models. On the other hand, binary quantization
reduces the vectors to either 1
or 0
, which performs better for
QAT embedding models.
Scalar quantization is good at preserving recall for vectors from most embedding models. If your have vectors from QAT embedding models, binary quantization can provide better performance because the training process trains the model to adapt to the extreme reduction in precision.