Retrieval-Augmented Generation (RAG) with Atlas Vector Search
On this page
Retrieval-augmented generation (RAG) is an architecture used to augment large language models (LLMs) with additional data so that they can generate more accurate responses. You can implement RAG in your generative AI applications by combining an LLM with a retrieval system powered by Atlas Vector Search.
Why use RAG?
When working with LLMs, you might encounter the following limitations:
Stale data: LLMs are trained on a static dataset up to a certain point in time. This means that they have a limited knowledge base and might use outdated data.
No access to local data: LLMs don't have access to local or personalized data. Therefore, they can lack knowledge about specific domains.
Hallucinations: When training data is incomplete or outdated, LLMs can generate inaccurate information.
You can address these limitations by taking the following steps to implement RAG:
Ingestion: Store your custom data as vector embeddings in a vector database, such as MongoDB Atlas. This allows you to create a knowledge base of up-to-date and personalized data.
Retrieval: Retrieve semantically similar documents from the database based on the user's question by using a search solution, such as Atlas Vector Search. These documents augment the LLM with additional, relevant data.
Generation: Prompt the LLM. The LLM uses the retrieved documents as context to generate a more accurate and relevant response, reducing hallucinations.
Because RAG enables tasks such as question answering and text generation, it's an effective architecture for building AI chatbots that provide personalized, domain-specific responses. To create production-ready chatbots, you must configure a server to route requests and build a user interface on top of your RAG implementation.
RAG with Atlas Vector Search
To implement RAG with Atlas Vector Search, you ingest data into Atlas, retrieve documents with Atlas Vector Search, and generate responses using an LLM. This section describes the components of a basic, or naive, RAG implementation with Atlas Vector Search. For step-by-step instructions, see Get Started.
Ingestion
Data ingestion for RAG involves processing your custom data and storing it in a vector database to prepare it for retrieval. To create a basic ingestion pipeline with Atlas as the vector database, do the following:
Load your data. You can use tools like document loaders or data connectors to load data from different data formats and locations.
Process, or chunk, your data. Chunking involves splitting your data into smaller parts to improve performance.
Convert your data into vector embeddings by using an embedding model. To learn more, see How to Create Vector Embeddings.
Store these embeddings in Atlas. You store embeddings as a field alongside other data in your collection.
Retrieval
Building a retrieval system involves searching for and returning the most relevant documents from your vector database to augment the LLM with. To retrieve relevant documents with Atlas Vector Search, you convert the user's question into vector embeddings and run a vector search query against your data in Atlas to find documents with the most similar embeddings.
To perform basic retrieval with Atlas Vector Search, do the following:
Define an Atlas Vector Search index on the collection that contains your vector embeddings.
Choose one of the following methods to retrieve documents based on the user's question:
Use an Atlas Vector Search integration with a popular framework or service. These integrations include built-in libraries and tools that enable you to easily build retrieval systems with Atlas Vector Search.
Build your own retrieval system. You can define your own functions and pipelines to run Atlas Vector Search queries specific to your use case.
Generation
To generate responses, combine your retrieval system with an LLM. After you perform a vector search to retrieve relevant documents, you provide the user's question along with the relevant documents as context to the LLM so that it can generate a more accurate response.
Choose one of the following methods to connect to an LLM:
Use an Atlas Vector Search integration with a popular framework or service. These integrations include built-in libraries and tools to help you connect to LLMs with minimal set-up.
Call the LLM's API. Most AI providers offer APIs to their generative models that you can use to generate responses.
Load an open-source LLM. If you don't have API keys or credits, you can use an open-source LLM by loading it locally from your application.
Get Started
The following example demonstrates a basic RAG implementation with Atlas Vector Search by using the MongoDB LangChain integration and Hugging Face to easily load and access embedding and generative models.
Prerequisites
To complete this example, you must have the following:
An Atlas account with a cluster running MongoDB version 6.0.11 or 7.0.2 and later. To learn more, see Create a Cluster.
A Hugging Face Access Token with read access.
An environment to run interactive Python notebooks such as Colab.
Procedure
Set up the environment.
Create an interactive Python notebook by saving a file
with the .ipynb
extension, and then run the
following code in the notebook to install the dependencies:
pip install --quiet pymongo langchain langchain_community langchain_mongodb langchain_huggingface pypdf sentence_transformers
Ingest data into Atlas.
In this section, you ingest sample data into Atlas that LLMs don't have access to. The following code uses the LangChain integration and PyMongo driver to do the following:
Load a PDF that contains MongoDB's latest earnings report.
Split the data into chunks, specifying the chunk size (number of characters) and chunk overlap (number of overlapping characters between consecutive chunks).
Load the nomic-embed-text-v1 embedding model from Hugging Face's model hub.
Create vector embeddings from the data and store these embeddings in the
rag_db.test
collection in your Atlas cluster.
Paste and run the following code in your notebook, replacing
<connection-string>
with your Atlas connection string:
from langchain_community.document_loaders import PyPDFLoader from langchain.text_splitter import RecursiveCharacterTextSplitter from langchain_huggingface import HuggingFaceEmbeddings from langchain_mongodb import MongoDBAtlasVectorSearch from pymongo import MongoClient # Load the PDF loader = PyPDFLoader("https://investors.mongodb.com/node/12236/pdf") data = loader.load() # Split the data into chunks text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20) docs = text_splitter.split_documents(data) # Load the embedding model (https://huggingface.co/nomic-ai/nomic-embed-text-v1") model = HuggingFaceEmbeddings(model_name="nomic-ai/nomic-embed-text-v1", model_kwargs={ "trust_remote_code": True }) # Connect to your Atlas cluster client = MongoClient("<connection-string>") collection = client["rag_db"]["test"] # Store the data as vector embeddings in Atlas vector_store = MongoDBAtlasVectorSearch.from_documents( documents = docs, embedding = model, collection = collection, index_name = "vector_index" )
Tip
After running the code, you can
view your vector embeddings in the Atlas UI
by navigating to the rag_db.test
collection in your cluster.
Use Atlas Vector Search to retrieve documents.
In this section, you set up Atlas Vector Search to retrieve documents from your vector database. Complete the following steps:
Create an Atlas Vector Search index on your vector embeddings.
For free and shared clusters, follow the steps to create an index through the Atlas UI. Name the index
vector_index
and use the following index definition:{ "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 768, "similarity": "euclidean" } ] } For dedicated clusters, you can create the index directly from your application by using the PyMongo driver. Paste and run the following code in your notebook:
pymongo.operations import SearchIndexModel # Create your index model, then create the search index search_index_model = SearchIndexModel( definition = { "fields": [ { "type": "vector", "numDimensions": 768, "path": "embedding", "similarity": "cosine" } ] }, name = "vector_index", type = "vectorSearch" ) collection.create_search_index(model=search_index_model) Configure Atlas Vector Search as a retriever.
In your notebook, run the following code to set up your retrieval system and run a sample semantic search query by using the LangChain integration:
# Instantiate Atlas Vector Search as a retriever retriever = vector_store.as_retriever( search_type = "similarity" ) # Run a sample query in order of relevance retriever.invoke("AI technology") [Document(metadata={'_id': '66a910ba7f78f7ec6760ceba', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 0}, page_content="more of our customers. We also see a tremendous opportunity to win more legacy workloads, as AI has now become a catalyst to modernize these\napplications. MongoDB's document-based architecture is particularly well-suited for the variety and scale of data required by AI-powered applications."), Document(metadata={'_id': '66a910ba7f78f7ec6760ced6', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 1}, page_content='artificial intelligence, in our offerings or partnerships; the growth and expansion of the market for database products and our ability to penetrate that\nmarket; our ability to integrate acquired businesses and technologies successfully or achieve the expected benefits of such acquisitions; our ability to'), Document(metadata={'_id': '66a910ba7f78f7ec6760cec3', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 0}, page_content='MongoDB continues to expand its AI ecosystem with the announcement of the MongoDB AI Applications Program (MAAP),'), Document(metadata={'_id': '66a910ba7f78f7ec6760cec4', 'source': 'https://investors.mongodb.com/node/12236/pdf', 'page': 1}, page_content='which provides customers with reference architectures, pre-built partner integrations, and professional services to help\nthem quickly build AI-powered applications. Accenture will establish a center of excellence focused on MongoDB projects,\nand is the first global systems integrator to join MAAP.')]
Generate responses with the LLM.
In this section, you generate responses by prompting an LLM to use the retrieved documents as context. The following code uses LangChain to do the following:
Access the Mistral 7B Instruct model from Hugging Face's model hub.
Instruct the LLM to include the user's question and retrieved documents in the prompt by using a prompt template and chain.
Prompt the LLM about MongoDB's latest AI announcements.
Paste and run the following code in your notebook, replacing <token>
with your Hugging Face access token. The generated response might vary.
from langchain_huggingface import HuggingFaceEndpoint from langchain.prompts import PromptTemplate from langchain_core.runnables import RunnablePassthrough from langchain_core.output_parsers import StrOutputParser import os # Authenticate to your Hugging Face account os.environ["HF_TOKEN"] = "<token>" # Access the LLM (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) llm = HuggingFaceEndpoint(repo_id="mistralai/Mistral-7B-Instruct-v0.2") # Create prompt and RAG workflow prompt = PromptTemplate.from_template(""" Answer the following question based on the given context. Question: {question} Context: {context} """) rag_chain = ( { "context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) # Prompt the LLM question = "In a few sentences, what are MongoDB's latest AI announcements?" answer = rag_chain.invoke(question) print(answer)
Answer: MongoDB recently announced the MongoDB AI Applications Program (MAAP) as part of their efforts to expand their AI ecosystem. The document-based architecture of MongoDB is particularly well-suited for AI-powered applications, offering an opportunity to win more legacy workloads. These announcements were made at MongoDB.local NYC.
Next Steps
For more detailed RAG tutorials, use the following resources:
To learn how to implement RAG with popular LLM frameworks and AI services, see Integrate Vector Search with AI Technologies.
To learn how to implement RAG by using a local Atlas deployment and local models, see Build a Local RAG Implementation with Atlas Vector Search.
For use-case based tutorials and interactive Python notebooks, see Generative AI Use Cases Repository.
To start building production-ready chatbots with Atlas Vector Search, you can use the MongoDB Chatbot Framework. This framework provides a set of libraries that enable you to quickly build AI chatbot applications.
Fine-Tuning
To optimize and fine-tune your RAG applications, you can experiment with different embedding models, chunking strategies, and LLMs. To learn more, see the following resources:
How to Choose the Right Embedding Model for Your LLM Application
How to Choose the Right Chunking Strategy for Your LLM Application
Additionally, Atlas Vector Search supports advanced retrieval systems. Because you can seamlessly index vector data along with your other data in Atlas, you can fine-tune your retrieval results by pre-filtering on other fields in your collection or performing hybrid search to combine semantic search with full-text search results.