How to Perform Semantic Search Against Data in Your Atlas Cluster
On this page
This tutorial describes how to perform an ANN search on a vector in
the plot_embeddings
field in the sample_mflix.embedded_movies
collection on your Atlas cluster. To demonstrate this, it takes
you through the following steps:
Create an Atlas Vector Search index on the numeric field named
plot_embeddings
in thesample_mflix.embedded_movies
collection.Run Atlas Vector Search queries against the
plot_embeddings
field in thesample_mflix.embedded_movies
collection.
Prerequisites
To complete this tutorial, you must have the following:
An Atlas cluster with MongoDB version 6.0.11, or v7.0.2 or later (including RCs).
The sample data loaded into your Atlas cluster.
One of the following applications to run queries on your Atlas cluster:
You can also use Atlas Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.
Create the Atlas Vector Search Index
This section demonstrates how to create an Atlas Vector Search index on the
plot_embeddings
field in the sample_mflix.embedded_movies
collection for running vector queries against the field.
Required Access
To create an Atlas Vector Search index, you must have Project Data Access Admin
or higher access to the project.
Procedure
In Atlas, go to the Clusters page for your project.
If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
If the Clusters page is not already displayed, click Database in the sidebar.
Define the Atlas Vector Search index.
Click Create Search Index.
Under Atlas Vector Search, select JSON Editor and then click Next.
In the Database and Collection section, find the
sample_mflix
database, and select theembedded_movies
collection.In the Index Name field, enter
vector-search-tutorial
.Replace the default definition with the following index definition and then click Next.
Define the Atlas Vector Search index.
Replace the default definition with the following index definition.
This index definition specifies indexing the following fields in an index of the vectorSearch type:
plot_embedding
field as the vector type. Theplot_embedding
field contains embeddings created using OpenAI'stext-embedding-ada-002
embedding model. The index definition specifies1536
vector dimensions and measures similarity usingeuclidean
.genres
field as the filter type for pre-filtering data by string values in the field.year
field as the filter type for pre-filtering data by numeric values in the field.
1 { 2 "fields": [ 3 { 4 "type": "vector", 5 "path": "plot_embedding", 6 "numDimensions": 1536, 7 "similarity": "euclidean" 8 }, 9 { 10 "type": "filter", 11 "path": "genres" 12 }, 13 { 14 "type": "filter", 15 "path": "year" 16 } 17 ] 18 }
Run Queries Using the $vectorSearch
Aggregation Pipeline Stage
➤ Use the Select your language drop-down menu to select the client to use to run the example queries in this section.
Overview
This section demonstrates how to query the indexed vector data in
the sample_mflix.embedded_movies
collection using the
the $vectorSearch
stage. These sample queries also demonstrate
the various comparison query and aggregation
pipeline operators that we can use
in the query to pre-filter the data that we perform the semantic
search on.