Retrieval-Augmented Generation With MongoDB and Spring AI: Bringing AI to Your Java Applications
Rate this tutorial
AI this, AI that. Well, what can AI actually do for me? In this tutorial, we are going to discuss how we can leverage our own data to get the most out of generative AI.
And that’s where retrieval-augmented generation (RAG) comes in. It uses AI where it belongs — retrieving the right information and generating smart, context-aware answers. In this tutorial, we’re going to build a RAG app using Spring Boot, MongoDB Atlas, and OpenAI. The full code is available on GitHub.
RAG allows you to use data that was not available to train an AI model, to stuff your prompt and then use this data to supplement the large language model’s (LLM) response.
LLMs are a type of artificial intelligence (AI) that can generate and understand data. They are trained on massive datasets and can be used for answering your questions in an informative way.
While LLMs are very powerful, they have some limitations. One limitation is that outputs that are not always accurate or up-to-date. This is because LLMs are trained on data that has since become outdated, incomplete, or lacks proprietary knowledge about a specific use case or domain.
If you have data that has to remain internal for data security reasons, or even just questions on more up-to-date data, RAG can help you.
RAG consists of three main components:
- Your pre-trained LLM: This is what will generate the response — OpenAI, in our case.
- Vector search (semantic search): This is how we retrieve relevant documents from our MongoDB database.
- Vector embeddings: A numerical representation of our data captures the semantic meaning of our data.
Before beginning this tutorial, ensure that you have the following installed and configured:
- Java 21 or higher.
- Maven or Gradle (for managing dependencies): We use Maven for this tutorial.
- A minimum M10+ cluster is necessary to use the Spring AI MongoDB vector store as it creates the search index on our database programmatically.
- OpenAI API key: Sign up for OpenAI and obtain an API key.
- Other models are available, but this tutorial uses OpenAI.
To initialize the project:
- Set up the project metadata:
- Group:
com.mongodb
- Artifact:
RagApp
- Dependencies:
- Spring Web
- MongoDB Atlas Vector Database
- Open AI
- Download the project and open it in your preferred IDE.
Before we do anything, let's go to our
pom.xml
file and check the Spring AI version is <spring-ai.version>1.0.0-SNAPSHOT</spring-ai.version>
. We may need to change it to this, depending on what version of Spring we are using.The configuration for this project involves setting up two primary components:
- The EmbeddingModel using OpenAI to generate embeddings for documents.
- A MongoDBAtlasVectorStore to store and manage document vectors for similarity searches.
We’ll need to configure our project to connect to OpenAI and MongoDB Atlas by adding several properties to the
application.properties
file, along with the necessary credentials.1 spring.application.name=RagApp 2 3 spring.ai.openai.api-key=<Your-API-Key> 4 spring.ai.openai.chat.options.model=gpt-4o 5 6 spring.ai.vectorstore.mongodb.initialize-schema=true 7 8 spring.data.mongodb.uri=<Your-Connection-URI> 9 spring.data.mongodb.database=rag
You'll see here we have
initialize.schema
set to True
. This creates the index on our collection automatically, using Spring AI. If you are running a free cluster, this is not available. A workaround to this is creating it manually, which you can learn to do in the MongoDB documentation.Create a config package and add a
Config.java
to work in. Here’s how the configuration is set up in the Config
class:1 import org.springframework.ai.embedding.EmbeddingModel; 2 import org.springframework.ai.openai.OpenAiEmbeddingModel; 3 import org.springframework.ai.openai.api.OpenAiApi; 4 import org.springframework.ai.vectorstore.MongoDBAtlasVectorStore; 5 import org.springframework.ai.vectorstore.VectorStore; 6 import org.springframework.beans.factory.annotation.Value; 7 import org.springframework.context.annotation.Bean; 8 import org.springframework.context.annotation.Configuration; 9 import org.springframework.data.mongodb.core.MongoTemplate; 10 11 12 public class Config { 13 14 15 private String openAiKey; 16 17 18 public EmbeddingModel embeddingModel() { 19 return new OpenAiEmbeddingModel(new OpenAiApi(openAiKey)); 20 } 21 22 23 public VectorStore mongodbVectorStore(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) { 24 return new MongoDBAtlasVectorStore(mongoTemplate, embeddingModel, 25 MongoDBAtlasVectorStore.MongoDBVectorStoreConfig.builder().build(), true); 26 } 27 28 }
This class initializes the connection to the OpenAI API and configures the MongoDB-based vector store for storing document embeddings.
For this tutorial, we are using the MongoDB/devcenter-articles dataset, available on Hugging Face. This dataset consists of articles from the MongoDB Developer Center. In our resources, create a directory called docs and add our file to read in.
To embed and store data in the vector store, we’ll use a service that reads documents from a JSON file, converts them into embeddings, and stores them in the MongoDB Atlas vector store. This is done using the
DocsLoaderService.java
that we will create in a service
package:1 package com.mongodb.RagApp.service; 2 3 import com.fasterxml.jackson.databind.ObjectMapper; 4 import org.springframework.ai.document.Document; 5 import org.springframework.ai.vectorstore.VectorStore; 6 import org.springframework.beans.factory.annotation.Autowired; 7 import org.springframework.core.io.ClassPathResource; 8 import org.springframework.stereotype.Service; 9 10 import java.io.BufferedReader; 11 import java.io.InputStream; 12 import java.io.InputStreamReader; 13 import java.util.ArrayList; 14 import java.util.List; 15 import java.util.Map; 16 17 18 public class DocsLoaderService { 19 20 private static final int MAX_TOKENS_PER_CHUNK = 2000; 21 private final VectorStore vectorStore; 22 private final ObjectMapper objectMapper; 23 24 25 public DocsLoaderService(VectorStore vectorStore, ObjectMapper objectMapper) { 26 this.vectorStore = vectorStore; 27 this.objectMapper = objectMapper; 28 } 29 30 public String loadDocs() { 31 try (InputStream inputStream = new ClassPathResource("docs/devcenter-content-snapshot.2024-05-20.json").getInputStream(); 32 BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) { 33 34 List<Document> documents = new ArrayList<>(); 35 String line; 36 37 while ((line = reader.readLine()) != null) { 38 Map<String, Object> jsonDoc = objectMapper.readValue(line, Map.class); 39 String content = (String) jsonDoc.get("body"); 40 41 // Split the content into smaller chunks if it exceeds the token limit 42 List<String> chunks = splitIntoChunks(content, MAX_TOKENS_PER_CHUNK); 43 44 // Create a Document for each chunk and add it to the list 45 for (String chunk : chunks) { 46 Document document = createDocument(jsonDoc, chunk); 47 documents.add(document); 48 } 49 // Add documents in batches to avoid memory overload 50 if (documents.size() >= 100) { 51 vectorStore.add(documents); 52 documents.clear(); 53 } 54 } 55 if (!documents.isEmpty()) { 56 vectorStore.add(documents); 57 } 58 59 return "All documents added successfully!"; 60 } catch (Exception e) { 61 return "An error occurred while adding documents: " + e.getMessage(); 62 } 63 } 64 65 private Document createDocument(Map<String, Object> jsonMap, String content) { 66 Map<String, Object> metadata = (Map<String, Object>) jsonMap.get("metadata"); 67 68 metadata.putIfAbsent("sourceName", jsonMap.get("sourceName")); 69 metadata.putIfAbsent("url", jsonMap.get("url")); 70 metadata.putIfAbsent("action", jsonMap.get("action")); 71 metadata.putIfAbsent("format", jsonMap.get("format")); 72 metadata.putIfAbsent("updated", jsonMap.get("updated")); 73 74 return new Document(content, metadata); 75 } 76 77 private List<String> splitIntoChunks(String content, int maxTokens) { 78 List<String> chunks = new ArrayList<>(); 79 String[] words = content.split("\\s+"); 80 StringBuilder chunk = new StringBuilder(); 81 int tokenCount = 0; 82 83 for (String word : words) { 84 // Estimate token count for the word (approximated by character length for simplicity) 85 int wordTokens = word.length() / 4; // Rough estimate: 1 token = ~4 characters 86 if (tokenCount + wordTokens > maxTokens) { 87 chunks.add(chunk.toString()); 88 chunk.setLength(0); // Clear the buffer 89 tokenCount = 0; 90 } 91 chunk.append(word).append(" "); 92 tokenCount += wordTokens; 93 } 94 if (chunk.length() > 0) { 95 chunks.add(chunk.toString()); 96 } 97 return chunks; 98 } 99 }
This service reads a JSON file, processes each document, and stores it in MongoDB, along with an embedded vector of our content.
Now, this is a very simplistic approach of chunking (splitting large documents into smaller chunks that stay within the token limit and process them separately) implemented. This is because OpenAI has a token limit, so some of our documents are too large to embed in one go. This is fine for testing, but if you are moving to production, do your research and decide your own best way for dealing with these large documents.
Call this method however you wish, but I created a simple
DocsLoaderController
in my controller
package for testing.1 import com.mongodb.RagApp.service.DocsLoaderService; 2 import org.springframework.web.bind.annotation.GetMapping; 3 import org.springframework.web.bind.annotation.RequestMapping; 4 import org.springframework.web.bind.annotation.RestController; 5 6 7 8 public class DocsLoaderController { 9 10 private DocsLoaderService docsLoaderService; 11 12 public DocsLoaderController(DocsLoaderService docsLoaderService) { 13 this.docsLoaderService = docsLoaderService; 14 } 15 16 17 public String loadDocuments() { 18 return docsLoaderService.loadDocs(); 19 } 20 21 }
Once the data is embedded and stored, we can retrieve it through an API that uses a vector search to return relevant results. The
RagController
class is responsible for this:1 import org.springframework.ai.chat.client.ChatClient; 2 import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor; 3 import org.springframework.ai.vectorstore.SearchRequest; 4 import org.springframework.ai.vectorstore.VectorStore; 5 import org.springframework.web.bind.annotation.CrossOrigin; 6 import org.springframework.web.bind.annotation.GetMapping; 7 import org.springframework.web.bind.annotation.RequestParam; 8 import org.springframework.web.bind.annotation.RestController; 9 10 11 public class RagController { 12 13 private final ChatClient chatClient; 14 15 public RagController(ChatClient.Builder builder, VectorStore vectorStore) { 16 this.chatClient = builder 17 .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults())) 18 .build(); 19 } 20 21 22 public String question( String message) { 23 return chatClient.prompt() 24 .user(message) 25 .call() 26 .content(); 27 } 28 }
There's a little bit going on here. Let's look at the
ChatClient
. It offers an API for communicating with our AI model.The AI model processes two types of messages:
1. User messages, which are direct inputs from the user.
2. System messages, which are generated by the system to guide the conversation.
For the system message, we are using the default from the
QuestionsAnswerAdvisor
:1 private static final String DEFAULT_USER_TEXT_ADVISE = """ 2 Context information is below. 3 --------------------- 4 {question_answer_context} 5 --------------------- 6 Given the context and provided history information and not prior knowledge, 7 reply to the user comment. If the answer is not in the context, inform 8 the user that you can't answer the question. 9 """;
But we could edit this message and tailor it to our needs. There are also prompt options that can be specified, such as the temperature setting that controls the randomness or creativity of the generated output. You can find out more from the Spring documentation.
The
/question
endpoint allows users to ask questions, and it retrieves answers from the vector store by searching against the embedded documents semantically and sends these to the LLM with our context.To test our implementation:
- Start the Spring Boot application.
- Navigate to
http://localhost:8080/api/docs/load
to load documents into the vector store. - Use
http://localhost:8080/question?message=Your question here
to test the question-answer functionality.
For example, try asking:
http://localhost:8080/question?message=How to analyze time-series data with Python and MongoDB?Explain the steps
We should receive a relevant answer from the RAG app, formed from the embedded document data and the LLM.
In this project, we integrated a retrieval-augmented generation (RAG) system using MongoDB, OpenAI embeddings, and Spring Boot. The system can embed large amounts of document data and answer questions by leveraging vector similarity searches from a MongoDB Atlas vector store.
Next, learn more about what you can do with Java and MongoDB. You might enjoy Seamless Media Storage: Integrating Azure Blob Storage and MongoDB With Spring Boot. Or head over to the community forums and see what other people are doing with MongoDB.
Top Comments in Forums
There are no comments on this article yet.