Docs Menu
Docs Home
/
MongoDB Atlas
/ /

How to Perform Semantic Search Against Data in Your Atlas Cluster

On this page

  • Prerequisites
  • Create the Atlas Vector Search Index
  • Required Access
  • Procedure
  • Run Queries Using the $vectorSearch Aggregation Pipeline Stage
  • Overview
  • Procedure

This tutorial describes how to perform an ANN search on a vector in the plot_embeddings field in the sample_mflix.embedded_movies collection on your Atlas cluster. To demonstrate this, it takes you through the following steps:

  1. Create an Atlas Vector Search index on the numeric field named plot_embeddings in the sample_mflix.embedded_movies collection.

  2. Run Atlas Vector Search queries against the plot_embeddings field in the sample_mflix.embedded_movies collection.

To complete this tutorial, you must have the following:

  • An Atlas cluster with MongoDB version 6.0.11, or v7.0.2 or later (including RCs).

  • The sample data loaded into your Atlas cluster.

  • One of the following applications to run queries on your Atlas cluster:

    • mongosh

    • Java

    • MongoDB Node Driver

    • Pymongo

    You can also use Atlas Vector Search with local Atlas deployments that you create with the Atlas CLI. To learn more, see Create a Local Atlas Deployment.

This section demonstrates how to create an Atlas Vector Search index on the plot_embeddings field in the sample_mflix.embedded_movies collection for running vector queries against the field.

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the project.

1
  1. If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.

  2. If it is not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. If the Clusters page is not already displayed, click Database in the sidebar.

2
  1. Click your cluster's name.

  2. Click the Atlas Search tab.

3
  1. Click Create Search Index.

  2. Under Atlas Vector Search, select JSON Editor and then click Next.

  3. In the Database and Collection section, find the sample_mflix database, and select the embedded_movies collection.

  4. In the Index Name field, enter vector-search-tutorial.

  5. Replace the default definition with the following index definition and then click Next.

4
  1. Replace the default definition with the following index definition.

    This index definition specifies indexing the following fields in an index of the vectorSearch type:

    • plot_embedding field as the vector type. The plot_embedding field contains embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using euclidean.

    • genres field as the filter type for pre-filtering data by string values in the field.

    • year field as the filter type for pre-filtering data by numeric values in the field.

    1{
    2 "fields": [
    3 {
    4 "type": "vector",
    5 "path": "plot_embedding",
    6 "numDimensions": 1536,
    7 "similarity": "euclidean"
    8 },
    9 {
    10 "type": "filter",
    11 "path": "genres"
    12 },
    13 {
    14 "type": "filter",
    15 "path": "year"
    16 }
    17 ]
    18}
5

A modal window displays to let you know that your index is building.

6

The index should take about one minute to build. While it builds, the Status column reads Initial Sync. When it finishes building, the Status column reads Active.


Use the Select your language drop-down menu to select the client to use to run the example queries in this section.


This section demonstrates how to query the indexed vector data in the sample_mflix.embedded_movies collection using the the $vectorSearch stage. These sample queries also demonstrate the various comparison query and aggregation pipeline operators that we can use in the query to pre-filter the data that we perform the semantic search on.

← Atlas Vector Search Tutorials