Launch a Fully Managed RAG Workflow With MongoDB Atlas and Amazon Bedrock

BS
EO
Babu Srinivasan, Igor Alekseev, Erik Onnen6 min read • Published May 02, 2024 • Updated May 08, 2024
Facebook Icontwitter iconlinkedin icon

Introduction

MongoDB Atlas is now natively integrated with Amazon Bedrock Knowledge Base, making it even easier to build generative AI applications backed by enterprise data.
Amazon Bedrock, Amazon Web Services’ (AWS) managed cloud service for generative AI, empowers developers to build applications on top of powerful foundation models like Anthropic's Claude, Cohere Embed, and Amazon Titan. By integrating with Atlas Vector Search, Amazon Bedrock enables customers to leverage the vector database capabilities of Atlas to bring up-to-date context to Foundational Model outputs using proprietary data.
With the click of a button (see below), Amazon Bedrock now integrates MongoDB Atlas as a vector database into its fully managed, end-to-end retrieval-augmented generation (RAG) workflow, negating the need to build custom integrations to data sources or manage data flows.
Companies using MongoDB Atlas and Amazon Bedrock can now rapidly deploy and scale generative AI apps grounded in the latest up-to-date and accurate enterprise data. For enterprises with the most demanding privacy requirements, this capability is also available via AWS PrivateLink (more details at the bottom of this article).

What is retrieval-augmented generation?

One of the biggest challenges when working with generative AI is trying to avoid hallucinations, or erroneous results returned by the foundation model (FM) being used. The FMs are trained on public information that gets outdated quickly and the models cannot take advantage of the proprietary information that enterprises possess.
One way to tackle hallucinating FMs is to supplement a query with your own data using a workflow known as retrieval-augmented generation, or RAG. In a RAG workflow, the FM will seek specific data — for instance, a customer's previous purchase history — from a designated database that acts as a “source of truth” to augment the results returned by the FM. For a generative AI FM to search for, locate, and augment its responses, the relevant data needs to be turned into a vector and stored in a vector database.

How does the Knowledge Base integration work?

Within Amazon Bedrock, developers can now “click to add” MongoDB Atlas as a knowledge base for their vector data store to power RAG.
In the workflow, a customer chooses two different models: an embedding model and a generative model. These models are then orchestrated and used by Bedrock Agents during the interaction with the knowledge base — in this case, MongoDB Atlas.
Bedrock reads your text data from an S3 bucket, chunks the data, and then uses the embedding model chosen by the user to create the vector embeddings, storing these text chunks, embeddings, and related metadata in MongoDB Atlas’ vector database. An Atlas vector search index is also created as part of the setup for querying the vector embeddings.
High level block diagram depicting each of the services and its data flow

Why choose MongoDB Atlas as a Bedrock knowledge base?

MongoDB Atlas combines operational, vector, and metadata in a single platform, making it an ideal knowledge base for Amazon Bedrock users who want to augment their generative AI experiences while also simplifying their generative AI stack.
In addition, MongoDB Atlas gives developers the ability to set up dedicated infrastructure for search and vector search workloads, optimizing compute resources to scale search and database independently.

Solution architecture

architecture diagram
In the architecture diagram above, documents are uploaded to the Amazon Bedrock Knowledge Base (via S3) and stored within the MongoDB Atlas vector store. User queries are then addressed through specialized Amazon Bedrock Agents tailored to individual use cases, utilizing the MongoDB Atlas vector search functionality.

Dataset

In this demo, we use the Best Practices Guide for MongoDB to populate our knowledge base. Please download the PDF (by clicking on “Read Whitepaper” or “Email me the PDF”). Alternatively, you can download it from the GitHub repository. Once you have the PDF, upload it into an S3 bucket for hosting. (Note the bucket name as we will use it later in the article.)

Prerequisites

  • MongoDB Atlas account
  • AWS account

Implementation steps

Atlas Cluster and Database Setup

The screenshot shows the navigation of creating a database in MongoDB Atlas.

Atlas Vector Search index

Before we create an Amazon Bedrock knowledge base (using MongoDB Atlas), we need to create an Atlas Vector Search index.
  • In the MongoDB Atlas Console, navigate to your cluster and select the Atlas Search tab.
Atlas console navigation to create the search index
  • Select Create Search Index, select Atlas Vector Search, and select Next.
The screenshot shows the MongoDB Atlas Search Index navigation.
  • Select the database and the collection where the embeddings are stored.
MongoDB Atlas Search Index navigation
  • Supply the following JSON in the index definition and click Next, confirming and creating the index on the next page.
    1{
    2 "fields": [
    3 {
    4 "numDimensions": 1536,
    5 "path": "bedrock_embedding",
    6 "similarity": "cosine",
    7 "type": "vector"
    8 },
    9 {
    10 "path": "bedrock_metadata",
    11 "type": "filter"
    12 },
    13 {
    14 "path": "bedrock_text_chunk",
    15 "type": "filter"
    16 }
    17 ]
    18}
The screenshot shows the MongoDB Atlas Search Index navigation
Note: The fields in the JSON are customizable but should match the fields we configure in the Amazon Bedrock AWS console. If your source content contains filter metadata, the fields need to be included in the JSON array above in the same format: {"path": "<attribute_name>","type":"filter"}.

Amazon Bedrock Knowledge Base

  • In the AWS console, navigate to Amazon Bedrock, and then click Get started.
Landing page of the Amazon Bedrock console
  • Next, click on Model Access.
Overview page of the Amazon Bedrock console
  • Ensure that the Amazon and Anthropic models are selected.
Model Access page of the Amazon Bedrock console
  • Next, navigate to Knowledge Bases in the left-hand menu and select Create Knowledge Base.
Amazon Bedrock console page for selecting the Knowledge bases
  • Give your knowledge base a name and select Next. (Add an optional description, if you’d like.)
Configuration page of the Amazon Bedrock Knowledge bases console
  • Supply the S3 bucket where you uploaded the PDF of “Best Practices Guide for MongoDB” from earlier and select Next.
Navigation to set up the data source
  • Next, select Titan Embeddings Model.
Navigation for the selection of the embedding model
  • Scroll down to configure MongoDB Atlas as the vector database which was set up earlier.
Navigation for the vector store selection
  • Scroll down to fill out the MongoDB configuration options.
    • The below configuration steps assume connectivity to MongoDB Atlas over the Internet and it’s recommended only for non-production use cases.
    • To configure the connectivity over the secured PrivateLink (PL), follow the additional steps (detailed in the ReadMe) of the CDK script to configure Endpoint Service.
    • To supply the secret ARN, create a secret in this format: {"username":"xxxx","password":"xxx"}.
      Note: As a recommended security practice, the credentials should NOT have Atlas Admin privileges. They should be no more permissive than the Read and write to any database permission.
Configuration details for the MongoDB Atlas Vector as a Knowledge base
  • And metadata field mappings from the JSON file you set up earlier on Atlas via the JSON editor.
Configuration of the metadata field mapping for the vector search
  • Next, review and create the knowledge base.
Review panel for all the configurations made earlier
  • Once the creation is complete, navigate to the Data Source and click the Sync button to sync the data source.
Successful completion of the Knowledge base creation
  • When the sync completes, you can navigate to your database collection in the MongoDB Atlas console. Note the vector size matches the embeddings model vector size.
MongoDB database collections with vector embedding

Amazon Bedrock Agent

Amazon Bedrock Agents orchestrate interactions between foundation models, data sources, software applications, and user conversations. In addition, agents automatically call APIs to take actions and invoke knowledge bases to supplement information for these actions
  • In the AWS Bedrock console, create an Agent.
Landing page of the Amazon Bedrock Agents
  • Provide the agent Name and an optional description.
Configurations for the Amazon Bedrock Agent
  • Select a model and provide the prompt.
Selection of models for the Amazon Bedrock Agent
  • For our agent, we will skip the Action Group and configure our knowledge base instead. Select the knowledge base configured earlier, supply instructions for the Agent and select Add.
Selection of knowledge base for the Amazon Bedrock Agent
  • Next, save the configuration to create the agent.
Success message after saving agent’s configuration.
  • Once the agent is successfully created, go ahead and test it by asking a question.
Chat output for the given query

Conclusion

This article demonstrates the process of establishing a knowledge base in Amazon Bedrock, using MongoDB Atlas as the vector database. Once set up, Amazon Bedrock will use your MongoDB Atlas Knowledge Base for data ingestion, and subsequently craft an Agent capable of responding to inquiries based on your accurate, proprietary data.
Top Comments in Forums
There are no comments on this article yet.

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Add Memory to Your JavaScript RAG Application Using MongoDB and LangChain


Sep 18, 2024 | 9 min read
Tutorial

Building an AI Agent With Memory Using MongoDB, Fireworks AI, and LangChain


Aug 12, 2024 | 21 min read
Tutorial

Build a CRUD API With MongoDB, Typescript, Express, Prisma, and Zod


Sep 04, 2024 | 10 min read
Tutorial

Movie Score Prediction with BigQuery, Vertex AI, and MongoDB Atlas


Jan 13, 2025 | 11 min read