How to Build a RAG System Using Claude 3 Opus And MongoDB
Rate this tutorial
Anthropic, a provider of large language models (LLMs), recently introduced three state-of-the-art models classified under the Claude 3 model family. This tutorial utilises one of the Claude 3 models within a retrieval-augmented generation (RAG) system powered by the MongoDB vector database. Before diving into the implementation of the retrieval-augmented generation system, here's an overview of the latest Anthropic release:
Introduction of the Claude 3 model family:
- Models: The family comprises Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, each designed to cater to different needs and applications.
- Benchmarks: The Claude 3 models have established new standards in AI cognition, excelling in complex tasks, comprehension, and reasoning.
Capabilities and features:
- Multilingual and multimodal support: Claude 3 models can generate code and text in a non-English language. The models are also multimodal, with the ability to understand images.
- Long context window: The Claude 3 model initially has a 200K token context window, with the ability to extend up to one million tokens for specific use cases.
- Near-perfect recall: The models demonstrate exceptional recall capabilities when analyzing extensive amounts of text.
Design considerations:
- Balanced attributes: The development of the Claude 3 models was guided by three main factors — speed, intelligence, and cost-effectiveness. This gives consumers a variety of models to leverage for different use cases requiring a tradeoff on one of the factors for an increase in another.
That’s a quick update on the latest Anthropic release. Although the Claude 3 model has a large context window, a substantial cost is still associated with every call that reaches the upper thresholds of the context window provided. RAG is a design pattern that leverages a knowledge source to provide additional information to LLMs by semantically matching the query input with data points within the knowledge store.
This tutorial implements a chatbot prompted to take on the role of a venture capital tech analyst. The chatbot is a naive RAG system with a collection of tech news articles acting as its knowledge source.
What to expect from this tutorial:
- Gain insights into constructing a retrieval-augmented generation system by integrating Claude 3 models with MongoDB to enhance query response accuracy.
- Follow a comprehensive tutorial on setting up your development environment, from installing necessary libraries to configuring a MongoDB database.
- Learn efficient data handling methods, including creating vector search indexes and preparing data for ingestion and query processing.
- Understand how to employ Claude 3 models within the RAG system for generating precise responses based on contextual information retrieved from the database.
All implementation code presented in this tutorial is located in this GitHub repository
This section covers the steps taken to prepare the development environment source and clean the data utilised as the knowledge base for the venture capital tech analyst chatbot.
Set environment variables:
1 import os 2 3 os.environ["ANTHROPIC_API_KEY"] = "" 4 ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY") 5 6 os.environ["VOYAGE_API_KEY"] = "" 7 VOYAGE_API_KEY = os.environ.get("VOYAGE_API_KEY") 8 9 os.environ["HF_TOKEN"] = ""
The following code installs all the required libraries:
!pip install --quiet pymongo datasets pandas anthropic voyageai
Below are brief explanations of the tools and libraries utilised within the implementation code:
- anthropic: This is the official Python library for Anthropic that enables access to state-of-the-art language models. This library provides access to the Claude 3 family models, which can understand text and images.
- datasets: This library is part of the Hugging Face ecosystem. By installing datasets, we gain access to several pre-processed and ready-to-use datasets, which are essential for training and fine-tuning machine learning models or benchmarking their performance.
- pandas: This data science library provides robust data structures and methods for data manipulation, processing, and analysis.
- voyageai: This is the official Python client library for accessing Voyage's embedding models.
- pymongo: PyMongo is a Python toolkit for MongoDB. It enables interactions with a MongoDB database.
Tools like Pyenv and Conda can create isolated development environments to separate package versions and dependencies across your projects. In these environments, you can install specific versions of libraries, ensuring that each project operates with its own set of dependencies without interference. The implementation code presentation in this tutorial is best executed within a Colab or Notebook environment.
After importing the necessary libraries, the subsequent steps in this section involve loading the dataset that serves as the foundational knowledge base for the RAG system and chatbot. This dataset contains a curated collection of tech news articles from HackerNoon, supplemented with an additional column of embeddings. These embeddings were created by processing the descriptions of each article in the dataset. The embeddings for this dataset were generated using OpenAI’s embedding model. These will be removed and replaced using VoyageAI's embedding model:
voyage-large-2
.The tech-news-embedding dataset contains more than one million data points, mirroring the scale of data typically encountered in a production setting. However, only
500
data points are utilized for this particular application, but feel free to increase the number of data points.1 from datasets import load_dataset 2 import pandas as pd 3 4 # Make sure you have an Hugging Face token(HF_TOKEN) in your development environment before running the code below 5 # How to get a token: https://huggingface.co/docs/hub/en/security-tokens 6 7 # https://huggingface.co/datasets/MongoDB/tech-news-embeddings 8 dataset = load_dataset("MongoDB/tech-news-embeddings", split="train", streaming=True) 9 combined_df = dataset.take(500) 10 11 # Convert the dataset to a pandas dataframe 12 combined_df = pd.DataFrame(combined_df)
The code snippet above executes the following steps:
- Imports the necessary libraries: 'datasets' for loading the dataset and 'pandas' for data manipulation.
- Loads the "MongoDB/tech-news-embeddings" dataset from Hugging Face, using the 'train' split and enabling streaming mode.
- Takes the first 500 samples from the streamed dataset using the 'take' method.
- Converts the selected samples into a pandas DataFrame for easier manipulation and analysis.
This process effectively retrieves a subset of the tech news embeddings dataset and prepares it for further processing or analysis using pandas. One thing to note is that the streaming option allows for the efficient handling of large datasets by loading data in chunks rather than all at once, which is particularly useful when working with extensive datasets and environments with limited computing resources.
As a final phase in data preparation, the code snippet below shows the step to remove the
_id
column from the grouped dataset, as it is unnecessary for subsequent steps in this tutorial. Additionally, the data within the embedding column for each data point is removed, as we plan to generate new embeddings using the VoyageAI embedding model.1 # Remove the _id coloum from the intital dataset 2 combined_df = combined_df.drop(columns=['_id']) 3 4 # Convert each numpy array in the 'embedding' column to a normal Python list 5 combined_df['embedding'] = combined_df['embedding'].apply(lambda x: x.tolist())
After preparing our initial dataset, the next crucial step is to generate embeddings for our text data. These embeddings will allow us to perform vector searches enabled via MongoDB later.
1 import voyageai 2 3 vo = voyageai.Client(api_key=VOYAGE_API_KEY) 4 5 def get_embedding(text: str) -> list[float]: 6 if not text.strip(): 7 print("Attempted to get embedding for empty text.") 8 return [] 9 10 embedding = vo.embed(text, model="voyage-large-2", input_type="document") 11 12 return embedding.embeddings[0] 13 14 combined_df["embedding"] = combined_df["description"].apply(get_embedding) 15 16 combined_df.head()
This code snippet above performs some important operations:
- We import the VoyageAI library and initialize a client with our API key.
- We define a get_embedding function that:
- Checks for empty input text
- Uses the VoyageAI client to generate an embedding using the "voyage-large-2" model
- Returns the generated embedding
- We apply this function to the "description" column of our DataFrame, creating a new "embedding" column.
- Finally, we display the first few rows of our updated DataFrame to verify the new column.
This process is vital for preparing our data for vector search capabilities. By generating high-quality embeddings for each description, we're capturing the semantic meaning of our text data in a format optimized for vector operations. This will enable more accurate and efficient similarity searches in our vector database.
Remember, the choice of embedding model can significantly impact the quality of your search results. The VoyageAI model used here is known for its advanced NLP capabilities, which should provide robust embeddings for our use case.
In the next section, we'll look at how to use these embeddings to perform vector searches in MongoDB.
An approach to composing an AI stack focused on handling large data volumes and reducing data siloed is to utilise the same database provider for your operational and vector data. MongoDB acts as both an operational and a vector database. It offers a database solution that efficiently stores queries and retrieves vector embeddings.
To create a new MongoDB database, set up a database cluster:
- Select the “Database” option on the left-hand pane, which will navigate to the Database Deployment page with a deployment specification of any existing cluster. Create a new database cluster by clicking on the +Create button.
- For assistance with database cluster setup and obtaining the unique resource identifier (URI), refer to our guide for setting up a MongoDB cluster and getting your connection string.
Note: Don’t forget to whitelist the IP for the Python host or 0.0.0.0/0 for any IP when creating proof of concepts.
At this point, you have created a database cluster, obtained a connection string to the database, and placed a reference to the connection string within the development environment. The next step is to create a database and collect data through the MongoDB Atlas user interface.
Once you have created a cluster, navigate to the cluster page and create a database and collection within the MongoDB Atlas cluster by clicking + Create Database. The database will be named
knowledge
and the collection will be named research_papers
.By this point, you have created a cluster, database, and collection.
The steps in this section are crucial to ensure that a vector search can be conducted using the queries entered into the chatbot and searched against the records within the hacker_noon_tech_news collection. This step's objective is to create a vector search index. To achieve this, refer to the official vector search index creation guide.
In the creation of a vector search index using the JSON editor on MongoDB Atlas, ensure your vector search index is named vector_index and the vector search index definition is as follows:
1 { 2 "fields": [{ 3 "numDimensions": 256, 4 "path": "embedding", 5 "similarity": "cosine", 6 "type": "vector" 7 }] 8 }
To ingest data into the MongoDB database created in the previous steps, the following operations have to be carried out:
- Connect to the database and collection.
- Clear out the collection of any existing records.
- Convert the Pandas DataFrame of the dataset into dictionaries before ingestion.
- Ingest dictionaries into MongoDB using a batch operation.
1 import pymongo 2 3 def get_mongo_client(mongo_uri): 4 """Establish and validate connection to the MongoDB.""" 5 6 client = pymongo.MongoClient(mongo_uri, appname="devrel.showcase.anthropic_rag.python") 7 8 # Validate the connection 9 ping_result = client.admin.command('ping') 10 if ping_result.get('ok') == 1.0: 11 # Connection successful 12 print("Connection to MongoDB successful") 13 return client 14 else: 15 print("Connection to MongoDB failed") 16 return None 17 18 mongo_uri = os.environ["MONGO_URI"] 19 20 if not mongo_uri: 21 print("MONGO_URI not set in environment variables") 22 23 mongo_client = get_mongo_client(mongo_uri) 24 25 DB_NAME = "knowledge" 26 COLLECTION_NAME = "research_papers" 27 28 db = mongo_client.get_database(DB_NAME) 29 collection = db.get_collection(COLLECTION_NAME)
The code snippet above uses PyMongo to create a MongoDB client object, representing the connection to the cluster and enabling access to its databases and collections. The variables
DB_NAME
and COLLECTION_NAME
are given the names set for the database and collection in the previous step. If you’ve chosen different database and collection names, ensure they are reflected in the implementation code.The code snippet below guarantees that the current database collection is empty by executing the
delete_many()
operation on the collection.1 # To ensure we are working with a fresh collection 2 # delete any existing records in the collection 3 collection.delete_many({})
Ingesting data into a MongoDB collection from a pandas DataFrame is a straightforward process that can be efficiently accomplished by converting the DataFrame into dictionaries and then utilising the
insert_many
method on the collection to pass the converted dataset records.1 # Data Ingestion 2 combined_df_json = combined_df.to_dict(orient='records') 3 collection.insert_many(combined_df_json)
The data ingestion process should take less than a minute, and when data ingestion is completed, the IDs of the corresponding records of the ingested document are returned.
This section showcases the creation of a vector search custom function that accepts a user query, which corresponds to entries to the chatbot. The function also takes a second parameter,
collection
, which points to the database collection containing records against which the vector search operation should be conducted.The
vector_search
function produces a vector search result derived from a series of operations outlined in a MongoDB aggregation pipeline. This pipeline includes the $vectorSearch
and $project
stages and performs queries based on the vector embeddings of user queries. It then formats the results, omitting any record attributes unnecessary for subsequent processes.1 def vector_search(user_query, collection): 2 """ 3 Perform a vector search in the MongoDB collection based on the user query. 4 5 Args: 6 user_query (str): The user's query string. 7 collection (MongoCollection): The MongoDB collection to search. 8 9 Returns: 10 list: A list of matching documents. 11 """ 12 13 # Generate embedding for the user query 14 query_embedding = get_embedding(user_query) 15 16 if query_embedding is None: 17 return "Invalid query or embedding generation failed." 18 19 # Define the vector search pipeline 20 pipeline = [ 21 { 22 "$vectorSearch": { 23 "index": "vector_index", 24 "queryVector": query_embedding, 25 "path": "embedding", 26 "numCandidates": 150, # Number of candidate matches to consider 27 "limit": 5 # Return top 5 matches 28 } 29 }, 30 { 31 "$project": { 32 "_id": 0, # Exclude the _id field 33 "embedding": 0, # Exclude the embedding field 34 "score": { 35 "$meta": "vectorSearchScore" # Include the search score 36 } 37 } 38 } 39 ] 40 41 # Execute the search 42 results = collection.aggregate(pipeline) 43 return list(results)
The code snippet above conducts the following operations to allow semantic search for tech news articles:
- Define the
vector_search
function that takes a user's query string and a MongoDB collection as inputs and returns a list of documents that match the query based on vector similarity search. - Generate an embedding for the user's query by calling the previously defined function,
get_embedding
, which converts the query string into a vector representation. - Construct a pipeline for MongoDB's aggregate function, incorporating two main stages:
$vectorSearch
and$project
. - The
$vectorSearch
stage performs the actual vector search. The index field specifies the vector index to utilise for the vector search, and this should correspond to the name entered in the vector search index definition in previous steps. The queryVector field takes the embedding representation of the use query. The path field corresponds to the document field containing the embeddings. The numCandidates specifies the number of candidate documents to consider and the limit on the number of results to return. - The
$project
stage formats the results to exclude the_id
and theembedding
field. - The aggregate executes the defined pipeline to obtain the vector search results. The final operation converts the returned cursor from the database into a list.
The final section of the tutorial outlines the sequence of operations performed as follows:
- Accept a user query in the form of a string.
- Utilize the VoyageAI embedding model to generate embeddings for the user query.
- Load the Anthropic Claude 3— specifically, the ‘claude-3-opus-20240229’ model — to serve as the base model, which is the large language model for the RAG system.
- Execute a vector search using the embeddings of the user query to fetch relevant information from the knowledge base, which provides additional context for the base model.
- Submit both the user query and the gathered additional information to the base model to generate a response.
An important note is that the dimensions of the user query embedding match the dimensions set in the vector search index definition on MongoDB Atlas.
The next step in this section is to import the Anthropic library and load the client to access Anthropic’s methods for handling messages and accessing Claude models. Ensure you obtain an Anthropic API key located within the settings page on the official Anthropic website.
1 import anthropic 2 client = anthropic.Client(api_key=userdata.get("ANTHROPIC_API_KEY"))
The following code snippet introduces the function
handle_user_query
, which serves two primary purposes: It leverages a previously defined custom vector search function to query and retrieve relevant information from a MongoDB database, and it utilizes the Anthropic API via a client object to use one of the Claude 3 models for query response generation.1 def handle_user_query(query, collection): 2 3 get_knowledge = vector_search(query, collection) 4 5 search_result = '' 6 for result in get_knowledge: 7 search_result += ( 8 f"Title: {result.get('title', 'N/A')}, " 9 f"Company Name: {result.get('companyName', 'N/A')}, " 10 f"Company URL: {result.get('companyUrl', 'N/A')}, " 11 f"Date Published: {result.get('published_at', 'N/A')}, " 12 f"Article URL: {result.get('url', 'N/A')}, " 13 f"Description: {result.get('description', 'N/A')}, \n" 14 ) 15 16 response = client.messages.create( 17 model="claude-3-opus-20240229", 18 max_tokens=1024, 19 system="You are Venture Capital Tech Analyst with access to some tech company articles and information. Use the information you are given to provide advice.", 20 messages=[ 21 {"role": "user", "content": "Answer this user query: " + query + " with the following context: " + search_result} 22 ] 23 ) 24 25 return (response.content[0].text), search_result
This function begins by executing the vector search against the specified MongoDB collection based on the user's input query. It then proceeds to format the retrieved information for further processing. Subsequently, the function invokes the Anthropic API, directing the request to a specific Claude 3 model.
Below is a more detailed description of the operations in the code snippet above:
- Vector search execution: The function begins by calling
vector_search
with the user's query and a specified collection as arguments. This performs a search within the collection, leveraging vector embeddings to find relevant information related to the query. - Compile search results:
search_result
is initialized as an empty string to aggregate information from the search. The search results are compiled by iterating over the results returned by thevector_search
function and formates each item's details (title, company name, URL, publication date, article URL, and description) into a human-readable string, appending this information to search_result with a newline character \n at the end of each entry. - Generate response using Anthropic client: The function then constructs a request to the Anthropic API (through a client object, presumably an instance of the Anthropic client class created earlier). It specifies:The model to use ("claude-3-opus-20240229"), which indicates a specific version of the Claude 3 model.The maximum token limit for the generated response (max_tokens=1024).A system description guides the model to behave as a "Venture Capital Tech Analyst" with access to tech company articles and information, using this as context to advise.The actual message for the model to process, which combines the user query with the aggregated search results as context.
- Return the generated response and search results: It extracts and returns the response text from the first item in the response's content alongside the compiled search results.
1 # Conduct query with retrieval of sources 2 query = "Give me the best tech stock to invest in and tell me why" 3 response, source_information = handle_user_query(query, collection) 4 5 print(f"Response: {response}") 6 print(f"Source Information: \\n{source_information}")
The final step in this tutorial is to initialize the query, pass it into the
handle_user_query
function, and print the response returned.- Initialise query: The variable
query
is assigned a string value containing the user's request: "Give me the best tech stock to invest in and tell me why." This serves as the input for thehandle_user_query
function. - Execute
handle_user_query
function: The function takes two parameters — the user's query and a reference to the collection from which information will be retrieved. It performs a vector search to find relevant documents within the collection and formats the results for further use. It then queries the Anthropic Claude 3 model, providing it with the query and the formatted search results as context to generate an informed response. - Retrieve response and source information: The function returns two pieces of data: response and source_information. The response contains the model-generated answer to the user's query, while source_information includes detailed data from the collection used to inform the response.
- Display results: Finally, the code prints the response from the Claude 3 model, along with the source information that contributed to this response.
Claude 3 models possess what seems like impressive reasoning capabilities. From the response in the screenshot, it is able to consider expressive language as a factor in its decision-making and also provide a structured approach to its response.
More impressively, it gives a reason as to why other options in the search results are not candidates for the final selection. And if you notice, it factored the date into its selection as well.
Obviously, this is not going to replace any human tech analyst soon, but with a more extensive knowledge base and real-time data, this could very quickly become a co-pilot system for VC analysts.
Please remember that Opus's response is not financial advice and is only shown for illustrative purposes.
This tutorial has presented the essential steps of setting up your development environment, preparing your dataset, and integrating state-of-the-art language models with a powerful database system.
By leveraging the unique strengths of Claude 3 models and MongoDB, we've demonstrated how to create a RAG system that not only responds accurately to user queries but does so by understanding the context in depth. The impressive performance of the RAG system is a result of Opus parametric knowledge and the semantic matching capabilities facilitated by vector search.
Building a RAG system with the latest Claude 3 models and MongoDB sets up an efficient AI infrastructure. It offers cost savings and low latency by combining operational and vector databases into one solution. The functionalities of the naive RAG system presented in this tutorial can be extended to do the following:
- Get real-time news on the company returned from the search results.
- Get additional information by extracting text from the URLs provided in accompanying search results.
- Store additional metadata before data ingestion for each data point.
Some of the proposed functionality extensions can be achieved by utilising Anthropic function calling capabilities or leveraging search APIs. The key takeaway is that whether you aim to develop a chatbot, a recommendation system, or any application requiring nuanced AI responses, the principles and techniques outlined here will serve as a valuable starting point.
Want to leverage another state-of-the-art model for your RAG system? Check out our article that uses Google’s Gemma alongside open-source embedding models provided by Hugging Face.
1. What are the Claude 3 models, and how do they enhance a RAG system?
The Claude 3 models (Haiku, Sonnet, Opus) are state-of-the-art large language models developed by Anthropic. They offer advanced features like multilingual support, multimodality, and long context windows up to one million tokens. These models are integrated into RAG systems to leverage their ability to understand and generate text, enhancing the system's response accuracy and comprehension.
2. Why is MongoDB chosen for a RAG system powered by Claude 3?
MongoDB is utilized for its dual capabilities as an operational and a vector database. It efficiently stores, queries, and retrieves vector embeddings, making it ideal for managing the extensive data volumes and real-time processing demands of AI applications like a RAG system.
3. How does the vector search function work within the RAG system?
The vector search function in the RAG system conducts a semantic search against a MongoDB collection using the vector embeddings of user queries. It relies on a MongoDB aggregation pipeline, including the $vectorSearch and $project stages, to find and format the most relevant documents based on query similarity.
4. What is the significance of data embeddings in the RAG system?
Data embeddings are crucial for matching the semantic content of user queries with the knowledge stored in the database. They transform text into a vector space, enabling the RAG system to perform vector searches and retrieve contextually relevant information to inform the model's responses.
5. How does the RAG system handle user queries with Claude 3 models?
The RAG system processes user queries by generating embeddings using an embedding model (e.g., VoyageAI's "voyage-large-2") and conducting a vector search to fetch relevant information. This information and the user query are passed to a Claude 3 model, which generates a detailed and informed response based on the combined context.
Top Comments in Forums
Mboh_Bless_Pearl_NchongbohMboh Bless Pearl Nchongbohlast quarter
Actually, found this to be easy and straight to the point