Docs Menu
Docs Home
/
MongoDB Atlas
/

How to Create Vector Embeddings

On this page

  • Choose a Method to Create Embeddings
  • Create and Store Embeddings From Data
  • Prerequisites
  • Procedure
  • Create Embeddings for Queries
  • Procedure

Vector embeddings represent your data as points in multi-dimensional space. These embeddings capture meaningful relationships in your data and enable tasks such as semantic search and retrieval. You can store vector embeddings along with your other data in Atlas and use Atlas Vector Search to query your vectorized data.

To perform vector search over your embeddings, do the following:

  1. Choose a method to create vector embeddings.

  2. Create vector embeddings from your data and store them in Atlas.

  3. Create a vector embedding that represents your search query and run the query.

Atlas Vector Search returns documents whose vector embeddings are closest in distance to the embedding that represents your query. This indicates that they are similar in meaning.

To create vector embeddings, you must use an embedding model. To connect to an embedding model and create embeddings for Atlas Vector Search, use one of the following methods:

Method
Description
Tutorials
Load an open-source model
If you don't have API keys or credits with an embedding service, use an open-source embedding model by loading it locally from your application.
Create and Store Embeddings
Call an embedding service
Most AI providers offer APIs for their proprietary embedding models that you can use to create vector embeddings.
Use an integration

You can integrate Atlas Vector Search with open-source frameworks like LangChain and LlamaIndex, services like Amazon Bedrock, and more.

These integrations include built-in libraries and tools to help you quickly connect to open-source and proprietary embedding models and generate vector embeddings for Atlas Vector Search.

The following procedure shows how to create vector embeddings and store them in Atlas by using an open-source or proprietary embedding model.


➤ Use the Select your language drop-down menu to set the language of the examples on this page.


To run these examples, you must have the following:

  • An Atlas cluster running MongoDB version 6.0.11, 7.0.2, or later. Ensure that your IP address is included in your Atlas project's access list.

  • An environment to run interactive Python notebooks such as Colab.

Complete the following steps to create vector embeddings from custom data and store them in a collection in Atlas. You can also modify the following code to generate embeddings from your own data of any type.

Note

If you want to create embeddings for an existing collection, you must add a new field that contains the embedding and update each document in the collection.

1

In a terminal window, run the following commands to create a new directory named my-embeddings-project and initialize your project:

mkdir my-embeddings-project
cd my-embeddings-project
npm init -y
2

Configure your project to use ES modules by adding "type": "module" to your package.json file and then saving it.

{
"type": "module",
// other fields...
}
3

In a terminal window, run the following command:

npm install mongodb @xenova/transformers
4
  1. Create a file named create-embeddings.js and paste the following code. This code does the following:

    • Creates and exports an asynchronous function named getEmbedding to generate an embedding for a given input. This function also specifies:

    • Generates embeddings from the sample text data and ingests them into the sample_db.embeddings collection in Atlas by using the MongoDB Node.js Driver.

    create-embeddings.js
    import { MongoClient } from 'mongodb';
    import { pipeline } from '@xenova/transformers';
    // Function to generate embeddings for a given data source
    export async function getEmbedding(data) {
    const embedder = await pipeline(
    'feature-extraction',
    'Xenova/nomic-embed-text-v1');
    const response = await embedder(data, { pooling: 'mean', normalize: true });
    return Array.from(response.data);
    }
    // Data to embed
    const data = [
    "Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
    "The Lion King: Lion cub and future king Simba searches for his identity",
    "Avatar: A marine is dispatched to the moon Pandora on a unique mission"
    ]
    async function run() {
    // Connect to your Atlas cluster
    const uri = "<connectionString>";
    const client = new MongoClient(uri);
    try {
    await client.connect();
    const db = client.db("sample_db");
    const collection = db.collection("embeddings");
    // Ingest data and embeddings into Atlas
    await Promise.all(data.map(async text => {
    const embedding = await getEmbedding(text);
    // Check if the document already exists before inserting
    const existingDoc = await collection.findOne({ text: text });
    if (!existingDoc) {
    await collection.insertOne({
    text: text,
    embedding: embedding
    });
    console.log(embedding);
    }
    }));
    } catch (err) {
    console.log(err.stack);
    }
    finally {
    await client.close();
    }
    }
    run().catch(console.dir);
  2. Replace <connectionString> with your Atlas cluster's SRV connection string.

    Note

    Your connection string should use the following format:

    mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
  3. Save the file, then run the following command:

    node create-embeddings.js
    [ -0.04323853924870491, -0.008460805751383305, 0.012494648806750774, -0.013014335185289383, ... ]
    [ -0.017400473356246948, 0.04922063276171684, -0.002836339408531785, -0.030395228415727615, ... ]
    [ -0.016950927674770355, 0.013881809078156948, -0.022074559703469276, -0.02838018536567688, ... ]

    Note

    The number of dimensions in the output have been truncated for readability.

    You can also view your vector embeddings in the Atlas UI by navigating to the sample_db.embeddings collection in your cluster.

1

In a terminal window, run the following commands to create a new directory named my-embeddings-project and initialize your project:

mkdir my-embeddings-project
cd my-embeddings-project
npm init -y
2

Configure your project to use ES modules by adding "type": "module" to your package.json file and then saving it.

{
"type": "module",
// other fields...
}
3

In a terminal window, run the following command:

npm install mongodb openai
4
  1. Create a file named create-embeddings.js and paste the following code. This code does the following:

    • Creates and exports an asynchronous function named getEmbedding that uses OpenAI's text-embedding-3-small model to generate an embedding for a given input.

    • Generates embeddings from the sample text data and ingests them into the sample_db.embeddings collection in Atlas by using the MongoDB Node.js Driver.

    create-embeddings.js
    import { MongoClient } from 'mongodb';
    import OpenAI from 'openai';
    // Setup OpenAI configuration
    const openai = new OpenAI({
    apiKey: "<apiKey>",
    });
    // Function to get the embeddings using the OpenAI API
    export async function getEmbedding(text) {
    const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: text,
    encoding_format: "float",
    });
    return response.data[0].embedding;
    }
    // Data to embed
    const data = [
    "Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
    "The Lion King: Lion cub and future king Simba searches for his identity",
    "Avatar: A marine is dispatched to the moon Pandora on a unique mission"
    ]
    async function run() {
    // Connect to your Atlas cluster
    const uri = "<connectionString>";
    const client = new MongoClient(uri);
    try {
    await client.connect();
    const db = client.db("sample_db");
    const collection = db.collection("embeddings");
    // Ingest data and embeddings into Atlas
    await Promise.all(data.map(async text => {
    // Check if the document already exists
    const embedding = await getEmbedding(text);
    const existingDoc = await collection.findOne({ text: text });
    if (!existingDoc) {
    await collection.insertOne({
    text: text,
    embedding: embedding
    });
    console.log(embedding);
    }
    }));
    } catch (err) {
    console.log(err.stack);
    }
    finally {
    await client.close();
    }
    }
    run().catch(console.dir);
  2. Replace the following placeholder values:

    • <apiKey> with your OpenAI API key.

    • <connectionString> with your Atlas cluster's SRV connection string.

    Note

    Your connection string should use the following format:

    mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
  3. Save the file, then run the following command:

    node create-embeddings.js
    [ 0.031927742, -0.014192767, -0.021851597, 0.045498233, -0.0077904654, ... ]
    [ -0.01664538, 0.013198251, 0.048684783, 0.014485021, -0.018121032, ... ]
    [ 0.030449908, 0.046782598, 0.02126599, 0.025799986, -0.015830345, ... ]

    Note

    The number of dimensions in the output have been truncated for readability.

    You can also view your vector embeddings in the Atlas UI by navigating to the sample_db.embeddings collection in your cluster.

1

Create an interactive Python notebook by saving a file with the .ipynb extension, and then run the following command in the notebook to install the dependencies:

pip install --quiet nomic sentence-transformers pymongo
2

Paste and run the following code in your notebook to create a function that generates vector embeddings by using an open-source embedding model from Nomic AI. This code does the following:

  • Loads the nomic-embed-text-v1 embedding model.

  • Creates a function named get_embedding that uses the model to generate an embedding for a given text input.

  • Generates a single embedding for the string foo.

from nomic import embed
from sentence_transformers import SentenceTransformer
# Load the embedding model (https://huggingface.co/nomic-ai/nomic-embed-text-v1")
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
# Define a function to generate embeddings
def get_embedding(data):
"""Generates vector embeddings for the given data."""
embedding = model.encode(data)
return embedding.tolist()
# Generate an embedding
get_embedding("foo")
[-0.029808253049850464, 0.03841473162174225, -0.02561120130121708, -0.06707508116960526, 0.03867151960730553, ... ]
3

Paste and run the following code in your notebook to connect to your Atlas cluster, generate embeddings from custom data, and ingest them into the sample_db.embeddings collection by using the MongoDB PyMongo Driver. Replace <connection-string> with your Atlas cluster's SRV connection string.

Note

Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
import pymongo
# Connect to your Atlas cluster
mongo_client = pymongo.MongoClient("<connection-string>")
db = mongo_client["sample_db"]
collection = db["embeddings"]
# Sample data
data = [
"Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
"The Lion King: Lion cub and future king Simba searches for his identity",
"Avatar: A marine is dispatched to the moon Pandora on a unique mission"
]
# Ingest data into Atlas
for text in data:
embedding = get_embedding(text)
collection.insert_one({ "text": text, "embedding": embedding })

After running the sample code, you can view your vector embeddings in the Atlas UI by navigating to the sample_db.embeddings collection in your cluster.

1

Create an interactive Python notebook by saving a file with the .ipynb extension, and then run the following command in the notebook to install the dependencies:

pip install --quiet openai pymongo
2

Paste and run the following code in your notebook to create a function that generates vector embeddings by using a proprietary embedding model from OpenAI. Replace <api-key> with your OpenAI API key. This code does the following:

  • Specifies the text-embedding-3-small embedding model.

  • Creates a function named get_embedding that calls the model's API to generate an embedding for a given text input.

  • Generates a single embedding for the string foo.

import os
from openai import OpenAI
# Specify your OpenAI API key and embedding model
os.environ["OPENAI_API_KEY"] = "<api-key>"
model = "text-embedding-3-small"
openai_client = OpenAI()
# Define a function to generate embeddings
def get_embedding(text):
"""Generates vector embeddings for the given text."""
embedding = openai_client.embeddings.create(input = [text], model=model).data[0].embedding
return embedding
# Generate an embedding
get_embedding("foo")
[-0.005843308754265308, -0.013111298903822899, -0.014585349708795547, 0.03580040484666824, 0.02671629749238491, ... ]

Tip

See also:

For API details and a list of available models, refer to the OpenAI documentation.

3

Paste and run the following code in your notebook to connect to your Atlas cluster, generate embeddings from custom data, and ingest them into the sample_db.embeddings collection by using the MongoDB PyMongo Driver. Replace <connection-string> with your Atlas cluster's SRV connection string.

Note

Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
import pymongo
# Connect to your Atlas cluster
mongo_client = pymongo.MongoClient("<connection-string>")
db = mongo_client["sample_db"]
collection = db["embeddings"]
# Sample data
data = [
"Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
"The Lion King: Lion cub and future king Simba searches for his identity",
"Avatar: A marine is dispatched to the moon Pandora on a unique mission"
]
# Ingest data into Atlas
for text in data:
embedding = get_embedding(text)
collection.insert_one({ "text": text, "embedding": embedding })

After running the sample code, you can view your vector embeddings in the Atlas UI by navigating to the sample_db.embeddings collection in your cluster.

The following procedure shows how to create an embedding for Atlas Vector Search queries by using an open-source or proprietary embedding model.

Once you've created embeddings from the sample data, complete the following steps to create an Atlas Vector Search index on your data and embeddings for vector search queries.

1

To enable vector search queries on your data, you must create an Atlas Vector Search index on your collection.

Complete the following steps to create an index on the sample_db.embeddings collection that specifies the embedding field as the vector type, 768 vector dimensions, and the similarity measure as euclidean.

  1. Create a file named named create-index.js and paste the following code.

    create-index.js
    import { MongoClient } from 'mongodb';
    // connect to your Atlas deployment
    const uri = "<connectionString>";
    const client = new MongoClient(uri);
    async function run() {
    try {
    const database = client.db("sample_db");
    const collection = database.collection("embeddings");
    // define your Atlas Vector Search index
    const index = {
    name: "vector_index",
    type: "vectorSearch",
    definition: {
    "fields": [
    {
    "type": "vector",
    "numDimensions": 768,
    "path": "embedding",
    "similarity": "euclidean"
    }
    ]
    }
    }
    // run the helper method
    const result = await collection.createSearchIndex(index);
    console.log(result);
    } finally {
    await client.close();
    }
    }
    run().catch(console.dir);
  2. Replace <connectionString> with your Atlas cluster's SRV connection string.

  3. Save the file, then run the following command:

    node create-index.js

To learn more, see Create an Atlas Vector Search Index.

2
  1. Create a file named named vector-query.js and paste the following code.

    To run a vector search query, generate a query vector to pass into your aggregation pipeline.

    For example, the following code does the following:

    • Creates an embedding for the string ocean tragedy by calling the embedding function that you defined in the previous example.

    • Passes the embedding into the queryVector field in your aggregation pipeline.

    • Runs the vector search query. It return semantically similar documents in order of relevance and their vector search score.

    vector-query.js
    import { MongoClient } from 'mongodb';
    import { getEmbedding } from './create-embeddings.js';
    // MongoDB connection URI and options
    const uri = "<connectionString>";
    const client = new MongoClient(uri);
    async function run() {
    try {
    // Connect to the MongoDB client
    await client.connect();
    // Specify the database and collection
    const database = client.db("sample_db");
    const collection = database.collection("embeddings");
    // Generate embedding for the search query
    const queryEmbedding = await getEmbedding("ocean tragedy");
    // Define the sample vector search pipeline
    const pipeline = [
    {
    $vectorSearch: {
    index: "vector_index",
    queryVector: queryEmbedding,
    path: "embedding",
    exact: true,
    limit: 5
    }
    },
    {
    $project: {
    _id: 0,
    text: 1,
    score: {
    $meta: "vectorSearchScore"
    }
    }
    }
    ];
    // run pipeline
    const result = collection.aggregate(pipeline);
    // print results
    for await (const doc of result) {
    console.dir(JSON.stringify(doc));
    }
    } finally {
    await client.close();
    }
    }
    run().catch(console.dir);
  2. Replace <connectionString> with your Atlas cluster's SRV connection string.

  3. Save the file, then run the following command:

    node vector-query.js
    '{"text":"Titanic: The story of the 1912 sinking of the largest luxury liner ever built","score":0.5103757977485657}'
    '{"text":"Avatar: A marine is dispatched to the moon Pandora on a unique mission","score":0.4616812467575073}'
    '{"text":"The Lion King: Lion cub and future king Simba searches for his identity","score":0.4115804433822632}'
1

To enable vector search queries on your data, you must create an Atlas Vector Search index on your collection.

Complete the following steps to create an index on the sample_db.embeddings collection that specifies the embedding field as the vector type, 1536 vector dimensions, and the similarity measure as euclidean.

  1. Create a file named named create-index.js and paste the following code.

    create-index.js
    import { MongoClient } from 'mongodb';
    // connect to your Atlas deployment
    const uri = "<connectionString>";
    const client = new MongoClient(uri);
    async function run() {
    try {
    const database = client.db("sample_db");
    const collection = database.collection("embeddings");
    // define your Atlas Vector Search index
    const index = {
    name: "vector_index",
    type: "vectorSearch",
    definition: {
    "fields": [
    {
    "type": "vector",
    "numDimensions": 1536,
    "path": "embedding",
    "similarity": "euclidean"
    }
    ]
    }
    }
    // run the helper method
    const result = await collection.createSearchIndex(index);
    console.log(result);
    } finally {
    await client.close();
    }
    }
    run().catch(console.dir);
  2. Replace <connectionString> with your Atlas cluster's SRV connection string.

  3. Save the file, then run the following command:

    node create-index.js

To learn more, see Create an Atlas Vector Search Index.

2
  1. Create a file named named vector-query.js and paste the following code.

    To run a vector search query, generate a query vector to pass into your aggregation pipeline.

    For example, the following code does the following:

    • Creates an embedding for the string ocean tragedy by calling the embedding function that you defined in the previous example.

    • Passes the embedding into the queryVector field in your aggregation pipeline.

    • Runs the vector search query. It return semantically similar documents in order of relevance and their vector search score.

    vector-query.js
    import { MongoClient } from 'mongodb';
    import { getEmbedding } from './create-embeddings.js';
    // MongoDB connection URI and options
    const uri = "<connectionString>";
    const client = new MongoClient(uri);
    async function run() {
    try {
    // Connect to the MongoDB client
    await client.connect();
    // Specify the database and collection
    const database = client.db("sample_db");
    const collection = database.collection("embeddings");
    // Generate embedding for the search query
    const queryEmbedding = await getEmbedding("ocean tragedy");
    // Define the sample vector search pipeline
    const pipeline = [
    {
    $vectorSearch: {
    index: "vector_index",
    queryVector: queryEmbedding,
    path: "embedding",
    exact: true,
    limit: 5
    }
    },
    {
    $project: {
    _id: 0,
    text: 1,
    score: {
    $meta: "vectorSearchScore"
    }
    }
    }
    ];
    // run pipeline
    const result = collection.aggregate(pipeline);
    // print results
    for await (const doc of result) {
    console.dir(JSON.stringify(doc));
    }
    } finally {
    await client.close();
    }
    }
    run().catch(console.dir);
  2. Replace <connectionString> with your Atlas cluster's SRV connection string.

  3. Save the file, then run the following command:

    node vector-query.js
    '{"text":"Titanic: The story of the 1912 sinking of the largest luxury liner ever built","score":0.4551968574523926}'
    '{"text":"Avatar: A marine is dispatched to the moon Pandora on a unique mission","score":0.4050074517726898}'
    '{"text":"The Lion King: Lion cub and future king Simba searches for his identity","score":0.3594386577606201}'
1

To enable vector search queries on your data, you must create an Atlas Vector Search index on your collection.

Run the following code to create an index on the sample_db.embeddings collection that specifies the embedding field as the vector type, 768 vector dimensions, and the similarity measure as euclidean.

from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
definition = {
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 768,
"similarity": "euclidean"
}
]
},
name="vector_index",
type="vectorSearch",
)
collection.create_search_index(model=search_index_model)

To learn more, see Create an Atlas Vector Search Index.

2

To run a vector search query, generate a query vector to pass into your aggregation pipeline.

For example, the following code does the following:

  • Creates an embedding for the string ocean tragedy by calling the embedding function that you defined in the previous example.

  • Passes the embedding into the queryVector field in your aggregation pipeline.

  • Runs the vector search query. It return semantically similar documents in order of relevance and their vector search score.

# Generate embedding for the search query
query_embedding = get_embedding("ocean tragedy")
# Sample vector search pipeline
pipeline = [
{
"$vectorSearch": {
"index": "vector_index",
"queryVector": query_embedding,
"path": "embedding",
"exact": true,
"limit": 5
}
},
{
"$project": {
"_id": 0,
"text": 1,
"score": {
"$meta": "vectorSearchScore"
}
}
}
]
# Execute the search
results = collection.aggregate(pipeline)
# Print results
for i in results:
print(i)
{'text': 'Titanic: The story of the 1912 sinking of the largest luxury liner ever built','score': 0.5103757977485657}
{'text': 'Avatar: A marine is dispatched to the moon Pandora on a unique mission','score': 0.4616812467575073}
{'text': 'The Lion King: Lion cub and future king Simba searches for his identity','score': 0.4115804433822632}
1

To enable vector search queries on your data, you must create an Atlas Vector Search index on your collection.

Run the following code to create an index on the sample_db.embeddings collection that specifies the embedding field as the vector type, 1536 vector dimensions, and the similarity measure as euclidean.

from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
definition = {
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "euclidean"
}
]
},
name="vector_index",
type="vectorSearch",
)
collection.create_search_index(model=search_index_model)

To learn more, see Create an Atlas Vector Search Index.

2

To run a vector search query, generate a query vector to pass into your aggregation pipeline.

For example, the following code does the following:

  • Creates an embedding for the string ocean tragedy by calling the embedding function that you defined in the previous example.

  • Passes the embedding into the queryVector field in your aggregation pipeline.

  • Runs the vector search query. It return semantically similar documents in order of relevance and their vector search score.

# Generate embedding for the search query
query_embedding = get_embedding("ocean tragedy")
# Sample vector search pipeline
pipeline = [
{
"$vectorSearch": {
"index": "vector_index",
"queryVector": query_embedding,
"path": "embedding",
"exact": true,
"limit": 5
}
},
{
"$project": {
"_id": 0,
"text": 1,
"score": {
"$meta": "vectorSearchScore"
}
}
}
]
# Execute the search
results = collection.aggregate(pipeline)
# Print results
for i in results:
print(i)
{'text': 'Titanic: The story of the 1912 sinking of the largest luxury liner ever built','score': 0.4551968574523926}
{'text': 'Avatar: A marine is dispatched to the moon Pandora on a unique mission','score': 0.4050074517726898}
{'text': 'The Lion King: Lion cub and future king Simba searches for his identity','score': 0.3594386577606201}

Tip

You can also create embeddings by calling the API endpoint directly. To learn more, see OpenAI API Reference.

To learn more about running vector search queries, see Run Vector Search Queries.

Back

Atlas Vector Search Quick Start

Next

Create and Manage Indexes