Retrieval Augmented Generation for Claim Processing: Combining MongoDB Atlas Vector Search and Large Language Models

Jeff Needham, Luca Napoli, and Ainhoa Múgica
April 18, 2024 | Updated: August 8, 2024

Following up on our previous blog, AI, Vectors, and the Future of Claims Processing: Why Insurance Needs to Understand The Power of Vector Databases, we’ll pick up the conversation right where we left it. We discussed extensively how Atlas Vector Search can benefit the claim process in insurance and briefly covered Retrieval Augmented Generation (RAG) and Large Language Models (LLMs).

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

One of the biggest challenges for claim adjusters is pulling and aggregating information from disparate systems and diverse data formats. PDFs of policy guidelines might be stored in a content-sharing platform, customer information locked in a legacy CRM, and claim-related pictures and voice reports in yet another tool. All of this data is not just fragmented across siloed sources and hard to find but also in formats that have been historically nearly impossible to index with traditional methods. Over the years, insurance companies have accumulated terabytes of unstructured data in their data stores but have failed to capitalize on the possibility of accessing and leveraging it to uncover business insights, deliver better customer experiences, and streamline operations. Some of our customers even admit they’re not fully aware of all the data in their archives. There’s a tremendous opportunity to leverage this unstructured data to benefit the insurer and its customers.

Our image search post covered part of the solution to these challenges, opening the door to working more easily with unstructured data. RAG takes it a step further, integrating Atlas Vector Search and LLMs, thus allowing insurers to go beyond the limitations of baseline foundational models, making them context-aware by feeding them proprietary data. Figure 1 shows how the interaction works in practice: through a chat prompt, we can ask questions to the system, and the LLM returns answers to the user and shows what references it used to retrieve the information contained in the response. Great! We’ve got a nice UI, but how can we build an RAG application? Let’s open the hood and see what’s in it!

**Figure 1:** UI of the claim adjuster RAG-powered chatbot

Architecture and flow

Before we start building our application, we need to ensure that our data is easily accessible and in one secure place. Operational Data Layers (ODLs) are the recommended pattern for wrangling data to create single views. This post walks the reader through the process of modernizing insurance data models with Relational Migrator, helping insurers migrate off legacy systems to create ODLs.

Once the data is organized in our MongoDB collections and ready to be consumed, we can start architecting our solution. Building upon the schema developed in the image search post, we augment our documents by adding a few fields that will allow adjusters to ask more complex questions about the data and solve harder business challenges, such as resolving a claim in a fraction of the time with increased accuracy. Figure 2 shows the resulting document with two highlighted fields, “claimDescription” and its vector representation, “claimDescriptionEmbedding”. We can now create a Vector Search index on this array, a key step to facilitate retrieving the information fed to the LLM.

**Figure 2:** document schema of the claim collection, the highlighted fields are used to retrieve the data that will be passed as context to the LLM

Having prepared our data, building the RAG interaction is straightforward; refer to this GitHub repository for the implementation details. Here, we’ll just discuss the high-level architecture and the data flow, as shown in Figure 3 below:

The user enters the prompt, a question in natural language.
The prompt is vectorized and sent to Atlas Vector Search; similar documents are retrieved.
The prompt and the retrieved documents are passed to the LLM as context.
The LLM produces an answer to the user (in natural language), considering the context and the prompt.

**Figure 3:** RAG architecture and interaction flow

It is important to note how the semantics of the question are preserved throughout the different steps. The reference to “adverse weather” related accidents in the prompt is captured and passed to Atlas Vector Search, which surfaces claim documents whose claim description relates to similar concepts (e.g., rain) without needing to mention them explicitly. Finally, the LLM consumes the relevant documents to produce a context-aware question referencing rain, hail, and fire, as we’d expect based on the user's initial question.

So what?

To sum it all up, what’s the benefit of combining Atlas Vector Search and LLMs in a Claim Processing RAG application?

Speed and accuracy: Having the data centrally organized and ready to be consumed by LLMs, adjusters can find all the necessary information in a fraction of the time.
Flexibility: LLMs can answer a wide spectrum of questions, meaning applications require less upfront system design. There is no need to build custom APIs for each piece of information you’re trying to retrieve; just ask the LLM to do it for you.
Natural interaction: Applications can be interrogated in plain English without programming skills or system training.
Data accessibility: Insurers can finally leverage and explore unstructured data that was previously hard to access.

Not just claim processing

The same data model and architecture can serve additional personas and use cases within the organization:

Customer Service: Operators can quickly pull customer data and answer complex questions without navigating different systems. For example, “Summarize this customer's past interactions,” “What coverages does this customer have?” or “What coverages can I recommend to this customer?”
Customer self-service: Simplify your members’ experience by enabling them to ask questions themselves. For example, “My apartment is flooded. Am I covered?” or “How long do windshield repairs take on average?”
Underwriting: Underwriters can quickly aggregate and summarize information, providing quotes in a fraction of the time. For example, “Summarize this customer claim history.” “I Am renewing a customer policy. What are the customer's current coverages? Pull everything related to the policy entity/customer. I need to get baseline info. Find relevant underwriting guidelines.”

If you would like to discover more about Converged AI and Application Data Stores with MongoDB, take a look at the following resources:

Head over to our quick-start guide to get started with Atlas Vector Search today.

← Previous

Vertex AI and MongoDB for Intelligent Retail Pricing

>> Announcement: Some features mentioned below will be deprecated on Sep. 30, 2025. Learn more . In today’s competitive retail environment, the ability to quickly adjust pricing in response to market trends, consumer demand, and competitors’ moves is not just an advantage — it's essential for survival. This is where dynamic pricing comes into play, serving as a strategic tool for businesses to pull in their quest for market dominance. Dynamic pricing goes beyond changing numbers; it’s a strategic approach that reflects the dynamic nature of the market, powered by data-driven insights that enable prices to be adjusted in real-time for maximum effectiveness. This shift towards a more agile, data-driven pricing strategy underscores a broader trend in the business world: the recognition of data as a foundational element in decision-making processes. By leveraging real-time data, businesses can ensure their pricing strategies are not only responsive to market fluctuations but also strategically aligned with their overall business objectives, thus driving retail competitiveness to new heights. Let’s uncover how integrating both platforms empowers developers when it comes to delivering best-in-class, data-driven applications. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. Google Cloud: A platform for real-time analytics and AI Google Cloud stands out as a powerhouse in real-time analytics and artificial intelligence (AI), offering the infrastructure necessary for dynamic pricing strategies and other data-driven business approaches. It's designed to facilitate big data analysis, machine learning, and operational agility. Built-in tools form the backbone of an effective dynamic pricing strategy. These include Vertex AI for advanced machine learning models following best-in-class MLOps practices, and Pub/Sub for real-time messaging to solve real-time data ingestion. By harnessing the power of Google Cloud, retailers can analyze vast quantities of data in real-time, from current market trends to customer behavior and competitor pricing. This enables businesses to make informed decisions swiftly, adjusting their pricing strategies to reflect the ever-changing market conditions. MongoDB: Flexible data modeling and rapid application development MongoDB complements Google Cloud by offering a high-performance document-based database with a flexible data model that allows rapid application development. For pricing data in particular, where there may be different variants for different sizes of stores or countries, the flexibility allows for the ease of storage of complex or hierarchical data. In addition, polymorphic capabilities allow you to use a single interface to represent different types, making your system more flexible. It also supports scalability as new types can be easily integrated. Lastly, it enhances efficiency by allowing the same operation to behave differently based on the object, reducing code redundancy. This flexible schema also enables seamless integration with AI models. MongoDB Atlas supports workload isolation , ensuring dedicated resources for AI tasks and smooth operation alongside core application workloads. Additionally, change streams and triggers can be utilized to capture real-time updates in the pricing data, allowing the AI model to be called upon for immediate analysis and adaptation and enabling in-app analytics for retailers to gain a competitive edge. Figure 1: MongoDB replica set: Workload Isolation In the dynamic pricing reference architecture, Atlas collections function as an ML feature store. By leveraging the capabilities of MongoDB Atlas as a developer data platform, we are able to embed real-time automated decision-making into our e-commerce applications and reduce operational overhead for both business operations and MLOps model fine-tuning. This is achieved through implementing a streamlined approach to data management, incorporating real-time, automated decision-making, workload isolation, change streams, triggers for immediate updates, and seamless integration with AI models. Dynamic pricing microservice overview Building an event-driven AI architecture leveraging MongoDB Atlas in Google Cloud is straightforward. We can summarize our dynamic pricing microservice by first describing the different components of its architecture, what they are used for, and how they interact with each other: Figure 2: Description of the different technology components of a dynamic pricing microservice and what they are used for. Handling data sources The proposed solution uses Google Cloud Pub/Sub to ingest data sources like customer behavior events in JSON format. Using a technology like Pub/Sub allows for scaling to handle a large number of messages and efficiently distribute them to many subscribers. This is partly because it allows for parallel processing of messages and can be distributed across multiple servers or instances. It is often a fundamental pattern in event-driven architectures, where the flow of the program is determined by events or messages, supporting reactive programming and making the system more responsive and efficient. Data federation We’ll use Vertex AI Notebooks to clean the data and train a TensorFlow model. This model will learn the non-linear relation between customer events, product names, and prices, enabling it to calculate the optimal predicted price. Orchestrating Using Cloud Functions, we orchestrate the customer events coming from the Pub/Sub topic to be converted into tensors, which are then stored in a MongoDB Atlas collection. This collection acts as a feature store serving as a centralized repository designed to store, manage, and serve features for machine learning (ML) models. Features represent individual measurable properties or characteristics used by ML models to make predictions or decisions. MongoDB’s document model flexibility paired with the document versioning pattern will allow us to design time-sensitive chunks of events and granularly manage the training datasets for our models. Serving The Cloud Function will use the event tensor to invoke our trained model that is served in a Vertex AI endpoint. The model will provide a predicted price score that can then be inserted into our product catalog stored in MongoDB so our e-commerce application can read the price change in real time. Dynamic pricing architecture: Putting it all together In the following architecture diagram, the blue data flow illustrates how customer event data is ingested into a Pub/Sub topic. This allows us to make a push subscription to a Cloud Function from the topic. This function orchestrates the data transformation from a raw event into a tensor and calls an endpoint to then update the predicted price into our MongoDB product catalog collection. By using this architectural approach, we can isolate raw events threads and build different services around them, reacting in real time for dynamic pricing or asynchronously for model training. With every component loosely coupled, we prevent the system from crashing completely. Moreover, publishers and subscribers can continue to process their logic without the need for the other components to receive or publish messages. Figure 3: Dynamic pricing architecture integrating different Google Cloud components and MongoDB Atlas as a Feature Store For businesses, this translates into more precise and responsive pricing strategies. In the model building and optimization phase, by utilizing TensorFlow within Google Cloud Vertex AI notebooks, retailers can harness the power of deep learning capabilities. The neural network model is capable of analyzing intricate patterns and relationships within large datasets. This is how businesses may capture nuanced market dynamics, customer behavior, and pricing elasticity with greater accuracy, leading to more optimized pricing decisions. But even the best of the models should be consistently optimized. Maintaining model effectiveness requires continuous adaptation. Regularly evaluating accuracy and performing feature engineering ensures your models stay sensitive to market changes. This underscores the importance of retraining as a core principle in a continuous improvement data science approach. Using MongoDB Atlas as your operational data layer means that your feature store is always accessible, reducing downtime and improving the efficiency of machine learning operations. On the other hand, cross-region deployments can bring features closer to where machine learning models are being trained or served, reducing latency and improving model performance. Get started The integration of Google Cloud and MongoDB presents an easy approach to modernizing dynamic pricing strategies. Leveraging real-time analytics, flexible data modeling, and reactive microservices architecture, it empowers businesses to achieve operational efficiencies and gain a competitive advantage in their pricing strategies. For retailers looking to elevate their pricing strategies, considering a strategic partnership with both technologies is essential. For a deeper dive into integrating the different components of this architecture, make sure to check our GitHub repository. Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

April 17, 2024

Next →

Redefining the Database for AI: Why MongoDB Acquired Voyage AI

AI is reshaping industries, redefining customer experiences, and transforming how businesses innovate, operate, and compete. While much of the focus is on frontier models, a fundamental challenge lies in data—how it is stored, retrieved, and made useful for AI applications. The democratization of AI-powered software depends on building on top of the right abstractions, yet today, creating useful, real-time AI applications at scale is not feasible for most organizations. The challenge isn’t just complexity—it’s trust. AI models are probabilistic, meaning their outputs aren’t deterministic and predictable. This is easily evident in the hallucination problem in chatbots today, and becomes even more critical with the rise of agents, where AI systems make autonomous decisions. Development teams need the ability to control, shape, and ground generated outputs to align with their objectives and ensure accuracy. AI-powered search and retrieval is a powerful tool that extracts relevant contextual data from specific sources, augmenting AI models to generate reliable and accurate responses or take responsible and safe actions, as seen in the prominent retrieval augmented generation (RAG) approach. At the core of AI-powered retrieval are embedding generation and reranking—two key AI components that capture the semantic meaning of data and assess the relevance of queries and results. We believe embedding generation and reranking, as well as AI-powered search, belong in the database layer, simplifying the stack and creating a more reliable foundation for AI applications. By bringing more intelligence into the database, we help businesses mitigate hallucinations, improve trustworthiness, and unlock AI’s full potential at scale. The most impactful applications require a flexible, intelligent, and scalable data foundation. That’s why we’re excited to announce the acquisition of Voyage AI , a leader in embedding and reranking models that dramatically improve accuracy through AI-powered search and retrieval. This move isn’t just about adding AI capabilities— it’s about redefining the database for the AI era . Why this matters: The future of AI is built on better relevance and accuracy in data AI is probabilistic—it’s not built like traditional software with pre-defined rules and logic. Instead, it generates responses or takes action based on how the AI model is trained and what data is retrieved. However, due to the probabilistic nature of the technology, AI can hallucinate. Hallucinations are a direct consequence of poor or imprecise retrieval—when AI lacks access to the right data, it generates plausible but incorrect information. This is a critical barrier to AI adoption, especially in enterprises and for mission-critical use cases where accuracy is non-negotiable. This makes retrieving the most relevant data essential for AI applications to deliver high-quality, contextually accurate results. Today, developers rely on a patchwork of separate components to build AI-powered applications. Sub-optimal choices of these components, such as embedding models, can yield low-relevancy data retrieval and low-quality generated outputs. This fragmented approach is complex, costly, inefficient, and cumbersome for developers. With Voyage AI, MongoDB solves this challenge by making AI-powered search and retrieval native to the database. Instead of implementing workarounds or managing separate systems, developers can generate high-quality embeddings from real-time operational data, store vectors, perform semantic search, and refine results—all within MongoDB. This eliminates complexity and delivers higher accuracy, lower latency, and a streamlined developer experience. What Voyage AI brings to MongoDB Voyage AI has built a world-class AI research team with roots at Stanford, MIT, UC Berkeley, and Princeton and has rapidly become a leader in high-precision AI retrieval. Their technology is already trusted by some of the most advanced AI startups, including Anthropic, LangChain, Harvey, and Replit. Notably, Voyage AI’s embedding models are the highest-rated zero-shot models in the Hugging Face community. Voyage AI’s models are designed to increase the quality of generated output by: Enhancing vector search by creating embeddings that better capture meaning across text, images, PDFs, and structured data. Improving retrieval accuracy through advanced reranking models that refine search results for AI-powered applications. Enabling domain-specific AI with fine-tuned models optimized for different industries such as financial services, healthcare, and law, and use cases such as code generation. By integrating Voyage AI’s retrieval capabilities into MongoDB, we’re helping organizations more easily build AI applications with greater accuracy and reliability—without unnecessary complexity. How Voyage AI will be integrated into MongoDB We are integrating Voyage AI with MongoDB in three phases. In the first phase, Voyage AI’s text embedding, multi-modal embedding, and reranking models will remain widely available through Voyage AI’s current APIs and via the AWS and Azure Marketplaces—ensuring developers can continue to use their best-in-class embedding and reranking capabilities. We will also invest in the scalability and enterprise readiness of the platform to support the increased adoption of Voyage AI’s models. Next, we will seamlessly embed Voyage AI’s capabilities into MongoDB Atlas , starting with an auto-embedding service for Vector Search, which will handle embedding generation automatically. Native reranking will follow, allowing developers to boost retrieval accuracy instantly. We also plan to expand domain-specific AI capabilities to better support different industries (e.g., financial services, legal, etc.) or use cases (e.g., code generation). Finally, we will advance AI-powered retrieval with enhanced multi-modal capabilities, enabling seamless retrieval and ranking of text, images, and video. We also plan to introduce instruction-tuned models, allowing developers to refine search behavior using simple prompts instead of complex fine-tuning. This will be complemented by embedding lifecycle management in MongoDB Atlas, ensuring continuous updates and real-time optimization for AI applications. What this means for developers and businesses AI-powered applications need more than a database that just stores, processes, and persists data—they need a database that actively improves retrieval accuracy, scales seamlessly, and eliminates operational friction. With Voyage AI, MongoDB redefines what’s required for a database to underpin mission-critical AI-powered applications. Developers will no longer need to manage external embedding APIs, standalone vector stores, or complex search pipelines. AI retrieval will be built into the database itself, making semantic search, vector retrieval, and ranking as seamless as traditional queries. For businesses, this translates to faster time-to-value and greater confidence in scaling AI applications. By delivering high-quality results at scale, enterprises can seamlessly integrate AI into their most critical use cases, ensuring reliability, performance, and real-world impact. Looking ahead: What comes next This is just the beginning. Our vision is to make MongoDB the most powerful and intuitive database for modern, AI-driven applications. Voyage AI’s models will soon be natively available in MongoDB Atlas. We will continue evolving MongoDB’s AI retrieval capabilities, making it smarter, more adaptable, and capable of handling a wider range of data types and use cases. Stay tuned for more details on how you can start using Voyage AI’s capabilities in MongoDB. To learn more about how MongoDB and Voyage AI are powering state-of-the-art AI search and retrieval for building, scaling, and deploying intelligent applications, visit our product page .

February 24, 2025