Retrieval Augmented Generation (RAG): The Open-Book Test for Gen AI

Steve Jurczak
October 26, 2023 | Updated: April 3, 2024
#genAI

The release of ChatGPT in November 2022 marked a groundbreaking moment for AI, introducing the world to an entirely new realm of possibilities created by the fusion of generative AI and machine learning foundation models, or large language models (LLMs). In order to truly unlock the power of LLMs, organizations need to not only access the innovative commercial and open-source models but also feed them vast amounts of quality internal and up-to-date data. By combining a mix of proprietary and public data in the models, organizations can expect more accurate and relevant LLM responses that better mirror what's happening at the moment.

The ideal way to do this today is by leveraging retrieval-augmented generation (RAG), a powerful approach in natural language processing (NLP) that combines information retrieval and text generation. Most people by now are familiar with the concept of prompt engineering, which is essentially augmenting prompts to direct the LLM to answer in a certain way. With RAG, you're augmenting prompts with proprietary data to direct the LLM to answer in a certain way based on contextual data. The retrieved information serves as a basis for generating coherent and contextually relevant text. This combination allows AI models to provide more accurate, informative, and context-aware responses to queries or prompts.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Applying retrieval-augmented generation (RAG) in the real world

Let's use a stock quote as an example to illustrate the usefulness of retrieval-augmented generation in a real-world scenario. Since LLMs aren't trained on recent data like stock prices, the LLM will hallucinate and make up an answer or deflect from answering the question entirely. Using retrieval-augmented generation, you would first fetch the latest news snippets from a database (often using vector embeddings in a vector database or MongoDB Atlas Vector Search) that contains the latest stock news. Then, you insert or "augment" these snippets into the LLM prompt. Finally, you instruct the LLM to reference the up-to-date stock news in answering the question. With RAG, because there is no retraining of the LLM required, the retrieval is very fast (sub 100 ms latency) and well-suited for real-time applications.

Another common application of retrieval-augmented generation is in chatbots or question-answering systems. When a user asks a question, the system can use the retrieval mechanism to gather relevant information from a vast dataset, and then it generates a natural language response that incorporates the retrieved facts.

RAG vs. fine-tuning

Users will immediately bump up against the limits of GenAI anytime there's a question that requires information that sits outside the LLM's training corpus, resulting in hallucinations, inaccuracies, or deflection. RAG fills in the gaps in knowledge that the LLM wasn't trained on, essentially turning the question-answering task into an “open-book quiz,” which is easier and less complex than an open and unbounded question-answering task.

Fine-tuning is another way to augment LLMs with custom data, but unlike RAG it's like giving it entirely new memories or a lobotomy. It's also time- and resource-intensive, generally not viable for grounding LLMs in a specific context, and especially unsuitable for highly volatile, time-sensitive information and personal data.

Conclusion

Retrieval-augmented generation can improve the quality of generated text by ensuring it's grounded in relevant, contextual, real-world knowledge. It can also help in scenarios where the AI model needs to access information that it wasn't trained on, making it particularly useful for tasks that require factual accuracy, such as research, customer support, or content generation. By leveraging RAG with your own proprietary data, you can better serve your current customers and give yourself a significant competitive edge with reliable, relevant, and accurate AI-generated output.

To learn more about how Atlas helps organizations integrate and operationalize GenAI and LLM data, download our white paper, Embedding Generative AI and Advanced Search into your Apps with MongoDB. If you're interested in leveraging generative AI at your organization, reach out to us today and find out how we can help your digital transformation.

← Previous

4 Key Considerations for Unlocking the Power of GenAI

Artificial intelligence is evolving at an unprecedented pace, and generative AI (GenAI) is at the forefront of the revolution. GenAI capabilities are vast, ranging from text generation to music and art creation. But what makes GenAI truly unique is its ability to deeply understand context, producing outputs that closely resemble that of humans. It's not just about conversing with intelligent chatbots. GenAI has the potential to transform industries, providing richer user experiences and unlocking new possibilities. In the coming months and years, we'll witness the emergence of applications that leverage GenAI's power behind the scenes, offering capabilities never before seen. Unlike now popular chatbots like ChatGPT, users won't necessarily realize that GenAI is working in the background. But behind the scenes, these new applications are combining information retrieval and text generation to deliver truly personalized and contextual user experiences in real-time. This process is called retrieval-augmented generation, or RAG for short. So, how does retrieval-augmented generation (RAG) work, and what role do databases play in this process? Let's delve deeper into the world of GenAI and its database requirements. Check out our AI resource page to learn more about building AI-powered apps with MongoDB. The challenge of training AI foundation models One of the primary challenges with GenAI is the lack of access to private or proprietary data. AI foundation models, of which large language models (LLMs) are a subset, are typically trained on publicly available data but do not have access to confidential or proprietary information. Even if the data were in the public domain, it might be outdated and irrelevant. LLMs also have limitations in recognizing very recent events or knowledge. Furthermore, without proper guidance, LLMs may produce inaccurate information, which is unacceptable in most situations. Databases play a crucial role in addressing these challenges. Instead of sending prompts directly to LLMs, applications can use databases to retrieve relevant data and include it in the prompt as context. For example, a banking application could query the user's transaction data from a legacy database, add it to the prompt, and then send this engineered prompt to the LLM. This approach ensures that the LLM generates accurate and up-to-date responses, eliminating the issues of missing data, stale data, and inaccuracies. Top 4 database considerations for GenAI applications It won't be easy for businesses to achieve real competitive advantage leveraging GenAI when everyone has access to the same tools and knowledge base. Rather, the key to differentiation will come from layering your own unique proprietary data on top of Generative AI powered by foundation models and LLMs. There are four key considerations organizations should focus on when choosing a database to leverage the full potential of GenAI-powered applications: Queryability: The database needs to be able to support rich, expressive queries and secondary indexes to enable real-time, context-aware user experiences. This capability ensures data can be retrieved in milliseconds, regardless of the complexity of the query or the size of data stored in the database. Flexible data model: GenAI applications often require different types and formats of data, referred to as multi-modal data. To accommodate these changing data sets, databases should have a flexible data model that allows for easy onboarding of new data without schema changes, code modifications, or version releases. Multi-modal data can be challenging for relational databases because they're designed to handle structured data, where information is organized into tables with rows and columns, with strict schema rules. Integrated vector search: GenAI applications may need to perform semantic or similarity queries on different types of data, such as free-form text, audio, or images. Vector embeddings in a vector database enable semantic or similarity queries. Vector embeddings capture the semantic meaning and contextual information of data making them suitable for various tasks like text classification, machine translation, and sentiment analysis. Databases should provide integrated vector search indexing to eliminate the complexity of keeping two separate systems synchronized and ensuring a unified query language for developers. Scalability: As GenAI applications grow in terms of user base and data size, databases must be able to scale out dynamically to support increasing data volumes and request rates. Native support for scale-out sharding ensures that database limitations aren't blockers to business growth. The ideal database solution: MongoDB Atlas MongoDB Atlas is a powerful and versatile platform for handling the unique demands of GenAI. MongoDB uses a powerful query API that makes it easy to work with multi-modal data, enabling developers to deliver more with less code. MongoDB is the most popular document database as rated by developers. Working with documents is easy and intuitive for developers because documents map to objects in object-oriented programming, which are more familiar than the endless rows and tables in relational databases. Flexible schema design allows for the data model to evolve to meet the needs of GenAI use cases, which are inherently multi-modal. By using sharding, Atlas scales out to support large increases in the volume of data and requests that come with GenAI-powered applications. MongoDB Atlas Vector Search embeds vector search indexing natively so there's no need to maintain two different systems. Atlas keeps Vector Search indexes up to date with the source data constantly. Developers can use a single endpoint and query language to construct queries that combine regular database query filters and vector search filters. This removes friction and provides an environment for developers to prototype and deliver GenAI solutions rapidly. Conclusion GenAI is poised to reshape industries and provide innovative solutions across sectors. With the right database solution, GenAI applications can thrive, delivering accurate, context-aware, and dynamic data-driven user experiences that meet the growing demands of today's fast-paced digital landscape. With MongoDB Atlas, organizations can unlock agility, productivity, and growth, providing a competitive edge in the rapidly evolving world of generative AI. To learn more about how Atlas helps organizations integrate and operationalize GenAI and LLM data, download our white paper, Embedding Generative AI and Advanced Search into your Apps with MongoDB . If you're interested in leveraging generative AI at your organization, reach out to us today and find out how we can help your digital transformation.

October 26, 2023

Next →

Enhancing the MongoDB Atlas Go SDK with Automated Code Generation

The MongoDB Atlas API Experience team is committed to offering a seamless developer experience to customers who build and automate against the MongoDB Atlas developer data platform. We offer various programmatic tools, such as the Atlas CLI and the Terraform Atlas Provider , and maintain the Atlas Admin API with key features, including versioning and Open API specification . Our latest offering, the MongoDB Atlas Go SDK , empowers developers to manage Atlas using Go . It eliminates the need for low-level Admin API calls by utilizing higher-level abstractions. New SDK versions are automatically generated on each Atlas update, ensuring access to the latest Atlas administrative capabilities. In this post, we’ll discuss the advantages of using an SDK over direct API calls, explore our decision for code generation over manual development, share insights from engineering challenges, review our adoption experience, and discuss some next steps for the Atlas SDKs. Benefits of using an SDK over direct API calls Direct API calls offer flexibility and fine-grained control but become cumbersome for complex interactions. Developers manually handle tasks like authentication, error handling, and response parsing, which can be time-consuming and error-prone. Additionally, keeping up with API updates can require regular low-level code updates. The Atlas Go SDK simplifies development with higher-level abstractions. Pre-built functions and structs encapsulate API interactions, reducing boilerplate code. This frees developers to focus on core application logic rather than API integration intricacies. The SDK automates authentication elements, error handling, and response parsing, reducing boilerplate code and errors. Plus, staying up-to-date becomes low-effort, as new SDK versions are auto-generated with each Atlas release. While leveraging new functionalities might require some code updates, the SDK significantly simplifies integration compared to direct API calls. Moving from a manual client to an auto-generated SDK Previously, MongoDB’s bespoke Go Client for MongoDB Atlas powered internal tools and customer applications. However, maintaining a manually written client increased toil and impacted the team’s productivity. Keeping code quality high required constant effort, leading to a backlog of unsupported Atlas features and a struggle to keep pace with new functionalities. As Go-based Atlas applications proliferated, the limitations of manual maintenance became clear. We needed a more scalable solution. We had previously built a mechanism to automatically generate the Open API specification for the Atlas Admin API. This machine-readable document details how to interact with the Atlas functionality through its REST API, accessible through the “Download” option in the MongoDB Atlas Administration API portal . This led us to explore leveraging that specification for client code generation as well. Building on this foundation, the Atlas Go SDK leverages the Open API spec and a robust delivery pipeline to streamline client generation. The pipeline scans for changes in the spec and triggers the generation of a new SDK version upon detection, producing a pull request as an intermediate output. Following our review and merge, the pipeline requires no other manual steps to continue working on the release process, ultimately publishing a new SDK version. Code generation has improved output consistency and speed over manual development, freeing up resources for other impactful projects. To showcase the time savings, adding support for a new resource in the old manual Go Client could take roughly two engineer days. The new SDK reduces that time to a less-than-an-hour code review, as all other steps are automated. Major challenges and solutions Transitioning to an auto-generated SDK wasn’t without its hurdles. Selecting the right tooling was crucial, and we explored both commercial and open-source solutions for Open API-based code generation. We opted for openapi-generator due to its open-source nature. The choice prioritized flexibility and control, ensuring long-term project autonomy. It also allowed us to open-source the Go SDK generation codebase, fostering community contributions and improvements. Leveraging the Atlas Open API specification, originally intended for documentation , presented unique challenges. While this approach offered a single source of truth, we encountered discrepancies between the spec and actual API behavior. Notably, minor, non-breaking API changes sometimes resulted in significant, breaking changes in the generated client. For instance, marking a field as optional in the API (non-breaking) turns the equivalent Go model property into a pointer (breaking). This discovery necessitated a two-fold solution. We addressed the root cause by improving the Open API specification, ensuring better alignment with code generation requirements. Then we developed an automated transformation process to optimize the spec for generating code. This transformed version remains optimized for code generation while the original spec continues to serve its original purpose of live API documentation. Having a pull request review as part of the release process has helped us ensure quality. In some early cases, it allowed us to catch issues, apply SDK-wide improvements, and add linting rules to prevent reoccurrence. Maintaining compatibility with the existing Go client was crucial for an easy migration experience. However, achieving perfect compatibility wasn’t always feasible due to inconsistencies in the handwritten client. We meticulously evaluated each change, balancing compatibility with the benefits of code generation. We also created a migration guide and best practices to aid with migrating existing applications. Finally, to minimize confusion and client bloat, we decided that each Go SDK release should target a single Admin API version. That simplified integration but created a versioning challenge: Go’s semantic versioning uses major bumps for breaking changes. We needed to communicate the targeted Atlas API version within the SDK version while simultaneously denoting breaking SDK changes unrelated to API updates. Our solution was to develop a unique major versioning scheme for the Go SDK. It incorporates the Admin API’s date as a prefix, with additional digits signaling breaking SDK changes for that specific API version. While unconventional, it adheres to semantic versioning and keeps developers informed. Adoption experience Our Atlas CLI tool was the proving ground for the new Go SDK. In 2023, it became the first application to migrate from the manual Go client by leveraging pre-release versions of the Go SDK. That staged adoption approach yielded significant benefits. By migrating early, we uncovered valuable improvement areas for the SDK, directly influencing its architecture. This early feedback loop also provided crucial insights into effective SDK generation. The migration wasn’t without its complexities. It addressed cross-cutting issues across three distinct layers: the Open API spec, the SDK generation, and the CLI integration, each presenting unique development challenges. However, this comprehensive approach ensured that all layers benefitted from the process, and we raised quality across the board. Following the successful CLI integration, we confidently expanded the SDK’s reach across our product portfolio. Now, it serves as a prominent and trusted middleware solution, not only for the Atlas CLI , but also for several Atlas DevOps tools. Recent additions include the Atlas integrations for AWS CloudFormation and CDK . The Terraform Atlas Provider and the Atlas Kubernetes Operator are also on the immediate roadmap, further solidifying the SDK as a core component within our DevOps ecosystem. External adoption of the Atlas Go SDK is also gaining momentum. Over 30% of customer-developed Go-based interactions with the Atlas Admin API now leverage the new SDK, a trend we see increasing monthly. Looking ahead As Atlas continues to scale, the pace of features will only continue to accelerate. Our primary focus for the Go SDK is to continuously expose new Atlas features as they are released in the Admin API while introducing quality-of-life improvements for developers at every opportunity. We strive to reduce boilerplate code, improve Error Handling, simplify Authentication / Authorization, and to enrich documentation with high-quality examples. The success of code generation has us exploring what else can be automated. Our team is looking at which other offerings could benefit from automation to free up development time for impactful projects that might be less readily automatable. Finally, we understand that our developer community thrives on various languages and technologies. Not using Go? Let us know what other languages you’d like to see supported (e.g. Python, Java, TypeScript) by sharing your feedback . Conclusion The Atlas Go SDK empowers Go developers to streamline interactions with the Atlas cloud platform. By leveraging code generation and a focus on developer experience, the SDK offers several advantages over manual API calls. Reduced complexity: pre-built functions and higher-level abstractions simplify development by tackling authentication, error handling, and response parsing. Improved maintainability: auto-generation of updated SDK versions with each Atlas release ensures access to the latest functionalities, minimizing manual code changes. Enhanced reliability: Tailored API models promote code reliability and catch potential errors at compile time. Our in-house adoption experience was a valuable proving ground, influencing the SDK’s development and uncovering key optimization points. Today, the SDK is a cornerstone within our DevOps ecosystem, accelerating the development of downstream DevOps tools. This successful transition proved the value of code generation for developer productivity and code quality. It also opened up the possibility of generating SDKs for more programming languages in the future. The Atlas Go SDK can be a valuable asset for Go developers who build solutions on Atlas. To get started with the SDK, see our Docs Page . We also welcome community contributions, so visit our Contributing Guidelines for more details. We invite you to try it , and we would love to hear your feedback. Go build with the new Atlas Go SDK today!

May 22, 2024