Retrieval Augmented Generation (RAG): The Open-Book Test for Gen AI

Steve Jurczak

#genAI

The release of ChatGPT in November 2022 marked a groundbreaking moment for AI, introducing the world to an entirely new realm of possibilities created by the fusion of generative AI and machine learning foundation models, or large language models (LLMs). In order to truly unlock the power of LLMs, organizations need to not only access the innovative commercial and open-source models but also feed them vast amounts of quality internal and up-to-date data. By combining a mix of proprietary and public data in the models, organizations can expect more accurate and relevant LLM responses that better mirror what's happening at the moment.

The ideal way to do this today is by leveraging retrieval-augmented generation (RAG), a powerful approach in natural language processing (NLP) that combines information retrieval and text generation. Most people by now are familiar with the concept of prompt engineering, which is essentially augmenting prompts to direct the LLM to answer in a certain way. With RAG, you're augmenting prompts with proprietary data to direct the LLM to answer in a certain way based on contextual data. The retrieved information serves as a basis for generating coherent and contextually relevant text. This combination allows AI models to provide more accurate, informative, and context-aware responses to queries or prompts.

Check out our AI resource page to learn more about building AI-powered apps with MongoDB.

Applying retrieval-augmented generation (RAG) in the real world

Let's use a stock quote as an example to illustrate the usefulness of retrieval-augmented generation in a real-world scenario. Since LLMs aren't trained on recent data like stock prices, the LLM will hallucinate and make up an answer or deflect from answering the question entirely. Using retrieval-augmented generation, you would first fetch the latest news snippets from a database (often using vector embeddings in a vector database or MongoDB Atlas Vector Search) that contains the latest stock news. Then, you insert or "augment" these snippets into the LLM prompt. Finally, you instruct the LLM to reference the up-to-date stock news in answering the question. With RAG, because there is no retraining of the LLM required, the retrieval is very fast (sub 100 ms latency) and well-suited for real-time applications.

Another common application of retrieval-augmented generation is in chatbots or question-answering systems. When a user asks a question, the system can use the retrieval mechanism to gather relevant information from a vast dataset, and then it generates a natural language response that incorporates the retrieved facts.

RAG vs. fine-tuning

Users will immediately bump up against the limits of GenAI anytime there's a question that requires information that sits outside the LLM's training corpus, resulting in hallucinations, inaccuracies, or deflection. RAG fills in the gaps in knowledge that the LLM wasn't trained on, essentially turning the question-answering task into an “open-book quiz,” which is easier and less complex than an open and unbounded question-answering task.

Fine-tuning is another way to augment LLMs with custom data, but unlike RAG it's like giving it entirely new memories or a lobotomy. It's also time- and resource-intensive, generally not viable for grounding LLMs in a specific context, and especially unsuitable for highly volatile, time-sensitive information and personal data.

Conclusion

Retrieval-augmented generation can improve the quality of generated text by ensuring it's grounded in relevant, contextual, real-world knowledge. It can also help in scenarios where the AI model needs to access information that it wasn't trained on, making it particularly useful for tasks that require factual accuracy, such as research, customer support, or content generation. By leveraging RAG with your own proprietary data, you can better serve your current customers and give yourself a significant competitive edge with reliable, relevant, and accurate AI-generated output.

To learn more about how Atlas helps organizations integrate and operationalize GenAI and LLM data, download our white paper, Embedding Generative AI and Advanced Search into your Apps with MongoDB. If you're interested in leveraging generative AI at your organization, reach out to us today and find out how we can help your digital transformation.