Get Started with the LlamaIndex Integration

On this page

Background
Prerequisites
Set Up the Environment
Use Atlas as a Vector Store
Create the Atlas Vector Search Index
Run Vector Search Queries
Answer Questions on Your Data
Next Steps

You can integrate Atlas Vector Search with LlamaIndex to implement retrieval-augmented generation (RAG) in your LLM application. This tutorial demonstrates how to start using Atlas Vector Search with LlamaIndex to perform semantic search on your data and build a RAG implementation. Specifically, you perform the following actions:

Set up the environment.
Store custom data on Atlas.
Create an Atlas Vector Search index on your data.
Run the following vector search queries:
- Semantic search.
- Semantic search with metadata pre-filtering.
Implement RAG by using Atlas Vector Search to answer questions on your data.

Work with a runnable version of this tutorial as a Python notebook.

Background

LlamaIndex is an open-source framework designed to simplify how you connect custom data sources to LLMs. It provides several tools such as data connectors, indexes, and query engines to help you load and prepare vector embeddings for RAG applications.

By integrating Atlas Vector Search with LlamaIndex, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by retrieving semantically similar documents from your data. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.

Prerequisites

To complete this tutorial, you must have the following:

An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.
An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.
An environment to run interactive Python notebooks such as Colab.

Set Up the Environment

Set up the environment for this tutorial. Create an interactive Python notebook by saving a file with the .ipynb extension. This notebook allows you to run Python code snippets individually, and you'll use it to run the code in this tutorial.

To set up your notebook environment:

Install and import dependencies.

Run the following command:

pip install --quiet --upgrade llama-index llama-index-vector-stores-mongodb llama-index-embeddings-openai pymongo

Then, run the following code to import the required packages:

import os, pymongo, pprint
from pymongo.operations import SearchIndexModel
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, StorageContext
from llama_index.core.settings import Settings
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.vector_stores import MetadataFilter, MetadataFilters, ExactMatchFilter, FilterOperator
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.mongodb import MongoDBAtlasVectorSearch

Define environment variables.

Run the following code, replacing the placeholders with the following values:

Your OpenAI API Key.
Your Atlas cluster's SRV connection string.

os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"

Note

Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

Configure LlamaIndex settings.

Run the following code to configure settings that are specific to LlamaIndex. These settings specify the following:

OpenAI as the LLM used by your application to answer questions on your data.
text-embedding-ada-002 as the embedding model used by your application to generate vector embeddings from your data.
Chunk size and overlap to customize how LlamaIndex partitions your data for storage.

Settings.llm = OpenAI()
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Settings.chunk_size = 100
Settings.chunk_overlap = 10

Use Atlas as a Vector Store

Then, load custom data into Atlas and instantiate Atlas as a vector database, also called a vector store. Copy and paste the following code snippets into your notebook.

Load the sample data.

For this tutorial, you use a publicly accessible PDF document titled MongoDB Atlas Best Practices as the data source for your vector store. This document describes various recommendations and core concepts for managing your Atlas deployments.

To load the sample data, run the following code snippet. It does the following:

Creates a new directory called data.
Retrieves the PDF from the specified URL and saves it as a file in the directory.
Uses the SimpleDirectoryReader data connector to extract raw text and metadata from the file. It also formats the data into documents.

# Load the sample data
!mkdir -p 'data/'
!wget 'https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4HkJP' -O 'data/atlas_best_practices.pdf'
sample_data = SimpleDirectoryReader(input_files=["./data/atlas_best_practices.pdf"]).load_data()
# Print the first document
sample_data[0]

Document(id_='e9893be3-e1a3-4249-9355-e4f42539f508', embedding=None, metadata={'page_label': '1', 'file_name': 'atlas_best_practices.pdf',
'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-20',
'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-20'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size',
'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date',
'last_modified_date', 'last_accessed_date'], relationships={}, text='Mong oDB Atlas Best P racticesJanuary 20 19A MongoD B White P aper\n',
start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')

Instantiate the vector store.

Run the following code to create a vector store named atlas_vector_store by using the MongoDBAtlasVectorSearch method, which specifies the following:

A connection to your Atlas cluster.
llamaindex_db.test as the Atlas database and collection used to store the documents.
vector_index as the index to use for querying the vector store.

Then, you save the vector store to a storage context, which is a LlamaIndex container object used to prepare your data for storage.

# Connect to your Atlas cluster
mongo_client = pymongo.MongoClient(ATLAS_CONNECTION_STRING)
# Instantiate the vector store
atlas_vector_store = MongoDBAtlasVectorSearch(
    mongo_client,
    db_name = "llamaindex_db",
    collection_name = "test",
    vector_index_name = "vector_index"
)
vector_store_context = StorageContext.from_defaults(vector_store=atlas_vector_store)

Store your data as vector embeddings.

Once you've loaded your data and instantiated Atlas as a vector store, generate vector embeddings from your data and store them in Atlas. To do this, you must build a vector store index. This type of index is a LlamaIndex data structure that splits, embeds, and then stores your data in the vector store.

The following code uses the VectorStoreIndex.from_documents method to build the vector store index on your sample data. It turns your sample data into vector embeddings and stores these embeddings as documents in the llamaindex_db.test collection in your Atlas cluster, as specified by the vector store's storage context.

Note

This method uses the embedding model and chunk settings that you configured when you set up your environment.

vector_store_index = VectorStoreIndex.from_documents(
   sample_data, storage_context=vector_store_context, show_progress=True
)

Tip

After running the sample code, you can view your vector embeddings in the Atlas UI by navigating to the langchain_db.test collection in your cluster.

Create the Atlas Vector Search Index

Note

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the Atlas project.

To enable vector search queries on your vector store, create an Atlas Vector Search index on the llamaindex_db.test collection.

In your notebook, run the following code to create an index of the vectorSearch type that indexes the following fields:

embedding field as the vector type. The embedding field contains the embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using cosine.
metadata.page_label field as the filter type for pre-filtering data by the page number in the PDF.

# Specify the collection for which to create the index
collection = mongo_client["llamaindex_db"]["test"]
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
  definition={
    "fields": [
      {
        "type": "vector",
        "path": "embedding",
        "numDimensions": 1536,
        "similarity": "cosine"
      },
      {
        "type": "filter",
        "path": "metadata.page_label"
      }
    ]
  },
  name="vector_index",
  type="vectorSearch"
)
collection.create_search_index(model=search_index_model)

The index should take about one minute to build. While it builds, the index is in an initial sync state. When it finishes building, you can start querying the data in your collection.

Run Vector Search Queries

Once Atlas builds your index, return to your notebook and run vector search queries on your data. The following examples demonstrate different queries that you can run on your vectorized data.

This example performs a basic semantic search for the string MongoDB Atlas security and returns a list of documents ranked by relevance score. It also specifies the following:

Atlas Vector Search as a retriever to perform semantic search.
The similarity_top_k parameter to return only the three most relevant documents.

retriever = vector_store_index.as_retriever(similarity_top_k=3)
nodes = retriever.retrieve("MongoDB Atlas security")
for node in nodes:
    print(node)

Node ID: 8a743e7c-4d28-4f7c-9c64-1033523a767d
Text: MongoD B Atlas provides: •Security f eatures to protect access
to your data •Built in replication for always-on availability ,
tolerating complete data center failure •Backups and point in time
recovery to protect against data corruption •Fine-grained monitoring
to let you know when to scale.
Score:  0.935
Node ID: 5904c51b-ac96-4a2f-818e-35c85af4b624
Text: MongoD B Atlas f eatures e xtensive capabilities to def end,
detect, and control access to MongoD B, off ering among the most
complete security controls of any modern database: •User Rights
Management.User Rights Management. Control access to sensitive data
using industry standard mec hanisms for authentication and
authorization at the database ...
Score:  0.932
Node ID: cb71a615-2f69-47b3-87e7-3373ff476fd6
Text: Protect data in motion over the network and at rest in
persistent storage To ensure a secure system right out of the b ox,
authentication and I P Address whitelisting are automatically enabled.
Review the security section of the MongoD B Atlas documentation to
learn more ab out eac h of the security features discussed below .
Score:  0.930

You can also pre-filter your data by using a match expression that compares the indexed field with boolean, number, or string values. You must index any metadata fields that you want to filter by as the filter type. To learn more, see How to Index Fields for Vector Search.

Note

You specified the metadata.page_label field as a filter when you created the index for this tutorial.

This example performs a semantic search for the string MongoDB Atlas security and returns a list of documents ranked by relevance score. It also specifies the following:

Atlas Vector Search as a retriever to perform semantic search.
The similarity_top_k parameter to return only the three most relevant documents.
A filter on the metadata.page_label field so that Atlas Vector Search searches for documents appearing on page 17 only.

# Specify metadata filters
metadata_filters = MetadataFilters(
   filters=[ExactMatchFilter(key="metadata.page_label", value="17")]
)
retriever = vector_store_index.as_retriever(similarity_top_k=3, filters=metadata_filters)
nodes = retriever.retrieve("MongoDB Atlas security")
for node in nodes:
    print(node)

Node ID: bd82d311-e70b-4d00-aab9-56b84ad16e3d
Text: Integrating MongoD B with External Monitoring S olutions The
MongoD B Atlas AP I provides integration with e xternal management
frameworks through programmatic access to automation f eatures and
alerts. APM Integration Many operations teams use Application P
erformance Monitoring (AP M) platforms to gain global oversight of 15
Score:  0.911
Node ID: c24f0bdd-d84e-4214-aceb-aa2cbd362819
Text: If the MongoD B cluster e xperiences a failure, the most
recentbackup is only moments behind, minimizing e xposure to data
loss. In additional, MongoD B Atlas includes queryable bac kups, which
allows you to perform queries against e xisting snapshots to more
easily restore data at the document/ object level. Queryable bac kups
allow you to acco...
Score:  0.911
Node ID: 642f08a3-f9b7-427b-81ce-00c1574eea01
Text: In the vast majority of cases, MongoD B Atlas bac kups delivers
the simplest, saf est, and most efficient bac kup solution. mongodump
is useful when data needs to be exported to another system, when a
local bac kup is needed, or when just a subset of the data needs to be
backed up.
Score:  0.909

Answer Questions on Your Data

This section demonstrates how to implement RAG in your application with Atlas Vector Search and LlamaIndex. Now that you've learned how to run vector search queries to retrieve semantically similar documents, run the following code to use Atlas Vector Search to retrieve documents and a LlamaIndex query engine to then answer questions based on those documents.

This example does the following:

Instantiates Atlas Vector Search as a vector index retriever, a specific type of retriever for vector stores. It includes the similarity_top_k parameter so that Atlas Vector Search retrieves only the 5 most relevant documents.

Instantiates the RetrieverQueryEngine query engine to answer questions on your data. When prompted, the query engine performs the following actions:
- Uses Atlas Vector Search as a retriever to query for semantically similar documents based on the prompt.
- Calls the LLM that you specified when you set up your environment to generate a context-aware response based on the retrieved documents.
Prompts the LLM with a sample query about Atlas security recommendations.
Returns the LLM's response and the documents used as context. The generated response might vary.

# Instantiate Atlas Vector Search as a retriever
vector_store_retriever = VectorIndexRetriever(index=vector_store_index, similarity_top_k=5)
# Pass the retriever into the query engine
query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)
# Prompt the LLM
response = query_engine.query('How can I secure my MongoDB Atlas cluster?')
print(response)
print("\nSource documents: ")
pprint.pprint(response.source_nodes)

You can secure your MongoDB Atlas cluster by utilizing security features
such as authentication, IP address whitelisting, encryption for data in
motion and at rest, user rights management, and encryption. Additionally,
you can set up global clusters on various cloud platforms with just a
few clicks in the MongoDB Atlas UI to ensure data is written to and
read from different regions.
Source documents:
[NodeWithScore(node=TextNode(id_='56884a56-6bcb-4890-9bdc-7d8eb9980b42', embedding=None, metadata={'page_label': '3', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='79ee3a70-7d3d-4dda-b2b4-8da9299ac639', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '3', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='4acc6a58693d749a7f3ddd92063755de00ab9bc8c11be03fd05814bc9c3d2e47'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='9c4f4242-e8c0-493d-b32d-21b900138210', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '3', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='6d12532c110420f9131f63bc1f676796103ea2b8078dfdab3809eaff9c4bde21'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='6554d774-108c-4602-8ce8-5aca08802b5a', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='ce37b9f7382f86f97316d5dd346f645175e4a392afabb11d6a13c2dce81395e5')}, text='MongoD B\nAtlas provides:\n•Security f eatures to protect access to your data\n•Built in replication for always-on availability , tolerating\ncomplete data center failure\n•Backups and point in time recovery to protect against\ndata corruption\n•Fine-grained monitoring to let you know when to scale.', start_char_idx=386, end_char_idx=679, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9282928109169006),
 NodeWithScore(node=TextNode(id_='5ac63468-529e-4f74-a263-2dc15183f793', embedding=None, metadata={'page_label': '13', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='ae95f83a-15f8-46bd-9603-ed14792b2f18', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '13', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='07a7475af2413b7ad4a3010191462eca9d1691e29d8194389de7a7333ed2d67b'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='6d77733c-8532-43a9-a38d-c1da51a5a51b', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '13', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='19ac3815d50ad3ba71f5119f9ebacc1c84742b7a215e014be2dbf46cf6f38cb6'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='99d8cf63-fecf-452b-aa2a-a5f6eec2933d', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='86b4419256e9d788383ea6a8cd30d4f37461f9f23e41c1e33ca9cd268dc12884')}, text='You can set up global clusters — available on Amazon W eb\nServices, Microsoft Azure, and Google Cloud Platform —\nwith just a f ew clic ks in the MongoD B Atlas U I. MongoD B\nAtlas takes care of the deployment and management of\ninfrastructure and database resources required to ensure\nthat data is written to and read from diff erent regions.', start_char_idx=498, end_char_idx=839, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9278459548950195),
 NodeWithScore(node=TextNode(id_='71589aef-f5e3-43de-b711-9b9e6e1c9f42', embedding=None, metadata={'page_label': '18', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='fdfddc80-aa07-4411-8b5d-f8e02c53551e', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '18', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8289ead3efad9fc0ffb10c1051f14a8a6357692c1ab8cc34841116591a3f4f01'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='ce3ad309-f8b0-4211-b4eb-db82afb18b8e', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '18', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8ddc31be6d74789b9a6fd9451bccb1d258bfc27cb60d443527eaad9de0d742ec'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='053bee76-40c8-42c7-b19c-3ec97a2eefab', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='5393211ed6e59c3ee8e1b2fc9e2529f403ee7241ee477da7c20242440a203976')}, text='Protect data in motion over the network\nand at rest in persistent storage\nTo ensure a secure system right out of the b ox,\nauthentication and I P Address whitelisting are\nautomatically enabled.\nReview the security section of the MongoD B Atlas\ndocumentation to learn more ab out eac h of the security\nfeatures discussed below .', start_char_idx=1852, end_char_idx=2179, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9274715781211853),
 NodeWithScore(node=TextNode(id_='c2f91ce0-f310-43a4-b473-e8feb8b2dcca', embedding=None, metadata={'page_label': '11', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='8be9cdd6-0d45-4e03-994c-d103aac018a4', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '11', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='7dcc304caa6d650f0d8a1709dfbdeb8bd5e96bd62ea37e09d44c61eff1ec3a82'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='b2952038-2966-4eb8-a590-38a47bf2d2ff', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '11', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='73dd5fb0c39eff5917f7ef8ebf2baed63463d720c147133bd1a030c71c0cfd22'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='3d175c9d-f332-44fd-ace6-17c676683e8e', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='2b982087b4e8a9600ae02c1dc31be7e4ab9b10d27d923654bd3de8e3fd134fae')}, text='Eac h node must be configured\nwith sufficient storage for the full data set, or for the subset\nto be stored in a single shard. T he storage speed and size\ncan be set when pic king the MongoD B Atlas instance\nduring cluster creation or reconfiguration.\nData volumes for customers deploying on A WS, Azure, and\nGCP are always encrypted.', start_char_idx=299, end_char_idx=633, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9221477508544922),
 NodeWithScore(node=TextNode(id_='ce3ad309-f8b0-4211-b4eb-db82afb18b8e', embedding=None, metadata={'page_label': '18', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='fdfddc80-aa07-4411-8b5d-f8e02c53551e', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '18', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8289ead3efad9fc0ffb10c1051f14a8a6357692c1ab8cc34841116591a3f4f01'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='d84004f0-4170-48c4-b9f7-69b76db64652', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '18', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='13f060ab7a04314bd0b814dd83f9334e1014c43be94f4913bd7387d0f0521a66'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='71589aef-f5e3-43de-b711-9b9e6e1c9f42', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='23826d53a8be4492a2e267e08e3481b309ef43c249148758610e5cc17354467f')}, text='MongoD B Atlas f eatures e xtensive capabilities to def end,\ndetect, and control access to MongoD B, off ering among\nthe most complete security controls of any modern\ndatabase:\n•User Rights Management.User Rights Management. Control access to sensitive\ndata using industry standard mec hanisms for\nauthentication and authorization at the database level•Encryption.Encryption.', start_char_idx=1476, end_char_idx=1851, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9206620454788208)]

This example does the following:

Defines a metadata filter on the metadata.page_label field so that Atlas Vector Search searches for documents appearing on page 17 only.
Instantiates Atlas Vector Search as a vector index retriever, a specific type of retriever for vector stores. It includes the metadata filters that you defined and the similarity_top_k parameter so that Atlas Vector Search retrieves only the 5 most relevant documents from page 17.

Instantiates the RetrieverQueryEngine query engine to answer questions on your data. When prompted, the query engine performs the following actions:
- Uses Atlas Vector Search as a retriever to query for semantically similar documents based on the prompt.
- Calls the LLM that you specified when you set up your environment to generate a context-aware response based on the retrieved documents.
Prompts the LLM with a sample query about Atlas security recommendations.
Returns the LLM's response and the documents used as context. The generated response might vary.

# Specify metadata filters
metadata_filters = MetadataFilters(
   filters=[ExactMatchFilter(key="metadata.page_label", value="17")]
)
# Instantiate Atlas Vector Search as a retriever
vector_store_retriever = VectorIndexRetriever(index=vector_store_index, filters=metadata_filters, similarity_top_k=5)
# Pass the retriever into the query engine
query_engine = RetrieverQueryEngine(retriever=vector_store_retriever)
# Prompt the LLM
response = query_engine.query('How can I secure my MongoDB Atlas cluster?')
print(response)
print("\nSource documents: ")
pprint.pprint(response.source_nodes)

Regular backups are essential for securing your MongoDB Atlas cluster.
By ensuring that backups are maintained continuously and are just a few
seconds behind the operational system, you can minimize exposure to data
loss in case of a failure. Additionally, utilizing queryable backups allows
you to easily restore data at the document/object level. Integrating external
monitoring solutions through the MongoDB Atlas API can also enhance security
by providing access to automation features and alerts.
Source documents:
[NodeWithScore(node=TextNode(id_='72afbd12-441c-4390-843d-cc11609a7855', embedding=None, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='45d87295-3d74-41bb-812f-789b72b4f8ba', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8e56ef0d706096509e6793e2406c4f5fd0bd020c077a0e7713dd5f3b595f7915'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='552250ae-a55b-4d6d-b326-6d736e5423c8', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='19f3143232ce10c30ee4d9f44012bf3b672ecba3240742d00c921149d9c73016'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='936e940e-2063-4649-8a9a-20090a87aa0a', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='4751bacb2f79e8e61b00828e28cee72a221c5b33bbbec942d431220b2446e507')}, text='If the\nMongoD B cluster e xperiences a failure, the most recentbackup is only moments behind, minimizing e xposure to\ndata loss.\nIn additional, MongoD B Atlas includes queryable bac kups,\nwhich allows you to perform queries against e xisting\nsnapshots to more easily restore data at the document/\nobject level. Queryable bac kups allow you to accomplish\nthe following with less', start_char_idx=1987, end_char_idx=2364, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.913266658782959),
 NodeWithScore(node=TextNode(id_='552250ae-a55b-4d6d-b326-6d736e5423c8', embedding=None, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='45d87295-3d74-41bb-812f-789b72b4f8ba', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8e56ef0d706096509e6793e2406c4f5fd0bd020c077a0e7713dd5f3b595f7915'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='a72f111d-1bb9-4173-a713-8bfce8cd2ad5', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='5da4ac9abb19e20a0b14481751a7d4a80f46f8968f804f1d3f4f04fb351886a3'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='72afbd12-441c-4390-843d-cc11609a7855', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='5c99659b2505c1de0600c65fc65cc19c97321a3b9607107d0cac342c5ec9887a')}, text='T aking regular bac kups off ers\nother advantages, as well. T he bac kups can be used to\nseed new environments for development, staging, or QA\nwithout impacting production systems.\nMongoD B Atlas bac kups are maintained continuously , just\na few seconds behind the operational system.', start_char_idx=1702, end_char_idx=1986, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9097342491149902),
 NodeWithScore(node=TextNode(id_='70fc2c34-1338-4f29-8fc6-7b8551ea2c39', embedding=None, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='45d87295-3d74-41bb-812f-789b72b4f8ba', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8e56ef0d706096509e6793e2406c4f5fd0bd020c077a0e7713dd5f3b595f7915'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='834d6586-9bee-4dd8-bf94-2306f1c21f8a', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='86fda9a7b7edce18f333bcbe91c28a9bdb0469957545b6e8cc7fc8e22228c820'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='be001832-41ee-46d2-bd29-4c8650129598', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='e0b09755cc3fad7edc84d2ad9e4b44c098e137c3efea14dd680e55b72c80ffe4')}, text='In the vast majority of cases, MongoD B Atlas bac kups\ndelivers the simplest, saf est, and most efficient bac kup\nsolution. mongodump is useful when data needs to be\nexported to another system, when a local bac kup is\nneeded, or when just a subset of the data needs to be\nbacked up.', start_char_idx=3104, end_char_idx=3386, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9047020673751831),
 NodeWithScore(node=TextNode(id_='be001832-41ee-46d2-bd29-4c8650129598', embedding=None, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='45d87295-3d74-41bb-812f-789b72b4f8ba', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8e56ef0d706096509e6793e2406c4f5fd0bd020c077a0e7713dd5f3b595f7915'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='70fc2c34-1338-4f29-8fc6-7b8551ea2c39', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='53fad6c5333cc41a5246f204a317696c4cb97420363910170f3ae25ef253c1da'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='e3ed474b-1ada-4e15-9f48-db37535bbdd6', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='4eb5d83c88741d63c76679251b2402ff084d33ffd9619f3dd74e5fc0dffc87e2')}, text='Integrating MongoD B with External\nMonitoring S olutions\nThe MongoD B Atlas AP I provides integration with e xternal\nmanagement frameworks through programmatic access to\nautomation f eatures and alerts.\nAPM Integration\nMany operations teams use Application P erformance\nMonitoring (AP M) platforms to gain global oversight of\n15', start_char_idx=3387, end_char_idx=3715, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9037604331970215),
 NodeWithScore(node=TextNode(id_='fd4d3ed9-a0d2-4663-9e0b-aee2faea2b4f', embedding=None, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='45d87295-3d74-41bb-812f-789b72b4f8ba', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '17', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='8e56ef0d706096509e6793e2406c4f5fd0bd020c077a0e7713dd5f3b595f7915'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='a53c9dbc-25ec-49cf-bd3c-04c2758dd681', node_type=<ObjectType.TEXT: '1'>, metadata={'page_label': '16', 'file_name': 'atlas_best_practices.pdf', 'file_path': 'data/atlas_best_practices.pdf', 'file_type': 'application/pdf', 'file_size': 512653, 'creation_date': '2024-02-21', 'last_modified_date': '2020-10-27', 'last_accessed_date': '2024-02-21'}, hash='ce8e610852c742743e0674dd6fc05126cc18138fa224e28fc0cc72c0319d087a'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='a07617d5-8090-47b4-92f8-f3bbe38cff54', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='9ad371a88420c2c0ace630858035b13b82f589042b0de31afc364bbe89d0d9ce')}, text='example, a poorly selected shard key can result in uneven\ndata distribution. In this case, most if not all of the queries\nwill be directed to the single mongodthat is managing the\ndata. F urthermore, MongoD B may be attempting to\nredistribute the documents to ac hieve a more ideal balance\nacross the servers.', start_char_idx=0, end_char_idx=309, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.9037080407142639)]

Next Steps

To explore LlamaIndex's full library of tools for RAG applications, which includes data connectors, indexes, and query engines, see LlamaHub.

To extend the application in this tutorial to have back-and-forth conversations, see Chat Engine.

MongoDB also provides the following developer resources: