Docs Menu

Docs HomeLaunch & Manage MongoDBMongoDB Atlas

Run Vector Search Queries

On this page

  • Definition
  • Fields
  • Behavior
  • Atlas Vector Search Index
  • Atlas Vector Search Score
  • Atlas Vector Search Pre-Filter
  • Limitations
  • Supported Clients
  • Parallel Query Execution Across Segments
  • Examples

Atlas Vector Search queries take the form of an aggregation pipeline stage. For the $vectorSearch queries, Atlas Vector Search returns the results of your semantic search.

The $vectorSearch stage performs an ANN search on a vector in the specified field. The field that you want to search must be indexed as Atlas Vector Search vector type inside a vectorSearch index type.

$vectorSearch

A $vectorSearch pipeline has the following prototype form:

{
"$vectorSearch": {
"index": "<index-name>",
"path": "<field-to-search>",
"queryVector": [<array-of-numbers>],
"numCandidates": <number-of-candidates>,
"limit": <number-of-results>,
"filter": {<filter-specification>}
}
}

The $vectorSearch stage takes a document with the following fields:

Field
Type
Necessity
Description
filter
document
Optional

Any MQL match expression that compares an indexed field with a boolean, number (not decimals), or string to use as a prefilter. You can use any of the following comparison query and aggregation pipeline operators in your filter:

To learn more, see Atlas Vector Search Pre-Filter.

index
string
Required

Name of the Atlas Vector Search index to use.

Atlas Vector Search doesn't return results if you misspell the index name or if the specified index doesn't already exist on the cluster.

limit
number
Required
Number (of type int only) of documents to return in the results. Value can't exceed the value of numCandidates.
numCandidates
number
Required

Number of nearest neighbors to use during the search. Value must be less than or equal to (<=) 10000. You can't specify a number less than the number of documents to return (limit).

We recommend that you specify a number higher than the number of documents to return (limit) to increase accuracy although this might impact latency. For example, we recommend a ratio of ten to twenty nearest neighbors for a limit of only one document. This overrequest pattern is the recommended way to trade off latency and recall in your ANN searches, and we recommend tuning this on your specific dataset.

path
string
Required
Indexed vectorEmbedding type field to search. To learn more, see Path Construction.
queryVector
array of numbers
Required

Array of numbers of the BSON double type that represent the query vector. The array size must match the number of vector dimensions specified in the index definition for the field.

Note

You must embed your query with the same model that you used to embed the data.

$vectorSearch must be the first stage of any pipeline where it appears.

You must index the fields to search using the $vectorSearch stage inside a vectorSearch type index definition. You can index the following types of fields in an Atlas Vector Search vectorSearch type index definition:

  • Fields that contain vector embeddings as vector type.

  • Fields that contain boolean, numeric, and string values as filter type to enable vector search on pre-filtered data.

To learn more about these Atlas Vector Search field types, see How to Index Fields for Vector Search.

Atlas Vector Search assigns a score, in a fixed range from 0 to 1 only, to every document that it returns. For cosine and dotProduct similarities, Atlas Vector Search normalizes the score using the following algorithm:

score = (1 + cosine/dot_product(v1,v2)) / 2

The score assigned to a returned document is part of the document's metadata. To include each returned document's score along with the result set, use a $project stage in your aggregation pipeline.

To retrieve the score of your Atlas Vector Search query results, use vectorSearchScore as the value in the $meta expression. That is, after the $vectorSearch stage, in the $project stage, the score field takes the $meta expression. The expression requires the vectorSearchScore value to return the score of documents for the vector search.

Example

1db.<collection>.aggregate([
2 {
3 "$vectorSearch": {
4 <query-syntax>
5 }
6 },
7 {
8 "$project": {
9 "<field-to-include>": 1,
10 "<field-to-exclude>": 0,
11 "score": { "$meta": "vectorSearchScore" }
12 }
13 }
14])

Note

Pre-filtering your data doesn't affect the score that Atlas Vector Search returns using $vectorSearchScore for $vectorSearch queries.

The $vectorSearch filter option matches only BSON boolean, string, and numeric values. You must index the fields that you want to filter your data by as the filter type in a vectorSearch type index definition. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison.

The $vectorSearch filter option supports only the following comparison query operators:

  • $gt

  • $lt

  • $gte

  • $lte

  • $eq

    Note

    Atlas Vector Search also supports the short form of $eq. In the short form, you don't need to specify $eq in the query. For example, consider the following $eq query:

    { "genres": { "$eq": "Comedy" } }

    You can run the preceding query using the short form of $eq the following way:

    { "genres": "Comedy" }
  • $ne

  • $in

  • $nin

Only matches a single value and doesn't support an array of values.

The $vectorSearch filter option supports only the following aggregation pipeline operators:

Note

The $vectorSearch filter option doesn't support other comparison query and aggregation pipeline operators.

$vectorSearch is supported only on Atlas clusters running the following MongoDB versions:

  • v6.0.11

  • v7.0.2 and later (including RCs).

$vectorSearch can't be used in view definition and the following pipeline stages:

You can pass the results of $vectorSearch to this stage.

You can run $vectorSearch queries using the Atlas Data Explorer, mongosh, and the following drivers:

You can also use Atlas Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.

We recommend dedicated search nodes to isolate vector search query processing. You might see improved query performance on the dedicated search nodes. Note that the high-CPU systems might provide more performance improvement. When Atlas Vector Search runs on search nodes, Atlas Vector Search parallelizes query execution across segments of data.

Parallelization of query processing improves the response time in many cases, such as queries on large datasets. Using intra-query parallelism during Atlas Vector Search query processing utilizes more resources, but improves latency for each individual query.

Note

Atlas Vector Search doesn't guarantee that each query will run concurrently. For example, when too many concurrent queries are queued, Atlas Vector Search might fall back to single-threaded execution.

The following queries search the sample sample_mflix.embedded_movies collection using the $vectorSearch stage. The queries search the plot_embedding field, which contains embeddings created using OpenAI's text-embedding-ada-002 embeddings model. If you added the sample collection to your Atlas cluster and created the sample indexes for the collection, you can run the following queries against the collection.


Use the Select your language drop-down menu to set the language of the examples in this page.


← Edit an Atlas Vector Search Index