How to Perform Semantic Search Against Data in Your Atlas Cluster

On this page

Prerequisites

Create the Atlas Vector Search Index
Required Access
Procedure
Run Queries Using the $vectorSearch Aggregation Pipeline Stage
Overview
Procedure

This tutorial describes how to perform an ANN search on a vector in the plot_embeddings field in the sample_mflix.embedded_movies collection on your Atlas cluster. To demonstrate this, it takes you through the following steps:

Create an Atlas Vector Search index on the numeric field named plot_embeddings in the sample_mflix.embedded_movies collection.
Run Atlas Vector Search queries against the plot_embeddings field in the sample_mflix.embedded_movies collection.

Prerequisites

To complete this tutorial, you must have the following:

An Atlas cluster with MongoDB version 6.0.11, or v7.0.2 or later (including RCs).
The sample data loaded into your Atlas cluster.
One of the following applications to run queries on your Atlas cluster:
You can also use Atlas Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.

Create the Atlas Vector Search Index

This section demonstrates how to create an Atlas Vector Search index on the plot_embeddings field in the sample_mflix.embedded_movies collection for running vector queries against the field.

Required Access

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the project.

Procedure

In Atlas, go to the Clusters page for your project.

If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.
If it is not already displayed, select your desired project from the Projects menu in the navigation bar.
If the Clusters page is not already displayed, click Database in the sidebar.

Go to the Atlas Search page for your cluster.

Click your cluster's name.
Click the Atlas Search tab.

Define the Atlas Vector Search index.

Click Create Search Index.
Under Atlas Vector Search, select JSON Editor and then click Next.
In the Database and Collection section, find the sample_mflix database, and select the embedded_movies collection.
In the Index Name field, enter vector-search-tutorial.
Replace the default definition with the following index definition and then click Next.

Define the Atlas Vector Search index.

Replace the default definition with the following index definition.

This index definition specifies indexing the following fields in an index of the vectorSearch type:

plot_embedding field as the vector type. The plot_embedding field contains embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using euclidean.
genres field as the filter type for pre-filtering data by string values in the field.
year field as the filter type for pre-filtering data by numeric values in the field.

1 {
2   "fields": [
3     {
4       "type": "vector",
5       "path": "plot_embedding",
6       "numDimensions": 1536,
7       "similarity": "euclidean"
8     },
9     {
10       "type": "filter",
11       "path": "genres"
12     },
13     {
14       "type": "filter",
15       "path": "year"
16     }
17   ]
18 }

Review the index definition and then click Create Search Index.

A modal window displays to let you know that your index is building.

Click Close to close the You're All Set! modal window and wait for the index to finish building.

The index should take about one minute to build. While it builds, the Status column reads Initial Sync. When it finishes building, the Status column reads Active.

Run Queries Using the `$vectorSearch` Aggregation Pipeline Stage

➤ Use the Select your language drop-down menu to select the client to use to run the example queries in this section.

Overview

This section demonstrates how to query the indexed vector data in the sample_mflix.embedded_movies collection using the the $vectorSearch stage. These sample queries also demonstrate the various comparison query and aggregation pipeline operators that we can use in the query to pre-filter the data that we perform the semantic search on.

Procedure

← Atlas Vector Search Tutorials

How to Perform Hybrid Search →

1	{
2	"fields": [
3	{
4	"type": "vector",
5	"path": "plot_embedding",
6	"numDimensions": 1536,
7	"similarity": "euclidean"
8	},
9	{
10	"type": "filter",
11	"path": "genres"
12	},
13	{
14	"type": "filter",
15	"path": "year"
16	}
17	]
18	}

Prerequisites

Create the Atlas Vector Search Index

Required Access

Procedure

In Atlas, go to the .css-h15tq0{font-style:normal;font-weight:700;}Clusters page for your project.

Go to the Atlas Search page for your cluster.

Define the Atlas Vector Search index.

Define the Atlas Vector Search index.

Review the index definition and then click Create Search Index.

Click Close to close the You're All Set! modal window and wait for the index to finish building.

Run Queries Using the $vectorSearch Aggregation Pipeline Stage

Overview

Procedure

In Atlas, go to the Clusters page for your project.

Run Queries Using the `$vectorSearch` Aggregation Pipeline Stage