Retrieval-Augmented Generation With MongoDB and Spring AI: Bringing AI to Your Java Applications

Tim Kelly6 min read • Published Sep 23, 2024 • Updated Sep 23, 2024

Spring AI Java

FULL APPLICATION

Rate this tutorial

AI this, AI that. Well, what can AI actually do for me? In this tutorial, we are going to discuss how we can leverage our own data to get the most out of generative AI.

And that’s where retrieval-augmented generation (RAG) comes in. It uses AI where it belongs — retrieving the right information and generating smart, context-aware answers. In this tutorial, we’re going to build a RAG app using Spring Boot, MongoDB Atlas, and OpenAI. The full code is available on GitHub.

What is retrieval-augmented generation?

RAG allows you to use data that was not available to train an AI model, to stuff your prompt and then use this data to supplement the large language model’s (LLM) response.

LLMs are a type of artificial intelligence (AI) that can generate and understand data. They are trained on massive datasets and can be used for answering your questions in an informative way.

While LLMs are very powerful, they have some limitations. One limitation is that outputs that are not always accurate or up-to-date. This is because LLMs are trained on data that has since become outdated, incomplete, or lacks proprietary knowledge about a specific use case or domain.

If you have data that has to remain internal for data security reasons, or even just questions on more up-to-date data, RAG can help you.

RAG consists of three main components:

Your pre-trained LLM: This is what will generate the response — OpenAI, in our case.
Vector search (semantic search): This is how we retrieve relevant documents from our MongoDB database.
Vector embeddings: A numerical representation of our data captures the semantic meaning of our data.

Prerequisites

Before beginning this tutorial, ensure that you have the following installed and configured:

Java 21 or higher.
Maven or Gradle (for managing dependencies): We use Maven for this tutorial.
MongoDB Atlas: You’ll need a MongoDB Atlas cluster.
- A minimum M10+ cluster is necessary to use the Spring AI MongoDB vector store as it creates the search index on our database programmatically.
OpenAI API key: Sign up for OpenAI and obtain an API key.
- Other models are available, but this tutorial uses OpenAI.

Preparing your project

Spring Initializr

To initialize the project:

Go to Spring Initializr.
Set up the project metadata:
- Group: com.mongodb
- Artifact: RagApp
- Dependencies:
  - Spring Web
  - MongoDB Atlas Vector Database
  - Open AI
Download the project and open it in your preferred IDE.

Configuration

Before we do anything, let's go to our pom.xml file and check the Spring AI version is <spring-ai.version>1.0.0-SNAPSHOT</spring-ai.version>. We may need to change it to this, depending on what version of Spring we are using.

The configuration for this project involves setting up two primary components:

The EmbeddingModel using OpenAI to generate embeddings for documents.
A MongoDBAtlasVectorStore to store and manage document vectors for similarity searches.

We’ll need to configure our project to connect to OpenAI and MongoDB Atlas by adding several properties to the application.properties file, along with the necessary credentials.

1 spring.application.name=RagApp  
2   
3 spring.ai.openai.api-key=<Your-API-Key>
4 spring.ai.openai.chat.options.model=gpt-4o  
5   
6 spring.ai.vectorstore.mongodb.initialize-schema=true  
7   
8 spring.data.mongodb.uri=<Your-Connection-URI>
9 spring.data.mongodb.database=rag

You'll see here we have initialize.schema set to True. This creates the index on our collection automatically, using Spring AI. If you are running a free cluster, this is not available. A workaround to this is creating it manually, which you can learn to do in the MongoDB documentation.

Create a config package and add a Config.java to work in. Here’s how the configuration is set up in the Config class:

1 import org.springframework.ai.embedding.EmbeddingModel;
2 import org.springframework.ai.openai.OpenAiEmbeddingModel;
3 import org.springframework.ai.openai.api.OpenAiApi;
4 import org.springframework.ai.vectorstore.MongoDBAtlasVectorStore;
5 import org.springframework.ai.vectorstore.VectorStore;
6 import org.springframework.beans.factory.annotation.Value;
7 import org.springframework.context.annotation.Bean;
8 import org.springframework.context.annotation.Configuration;
9 import org.springframework.data.mongodb.core.MongoTemplate;
10 
11 @Configuration
12 public class Config {
13 
14     @Value("${spring.ai.openai.api-key}")
15     private String openAiKey;
16 
17     @Bean
18     public EmbeddingModel embeddingModel() {
19         return new OpenAiEmbeddingModel(new OpenAiApi(openAiKey));
20     }
21 
22     @Bean
23     public VectorStore mongodbVectorStore(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) {
24         return new MongoDBAtlasVectorStore(mongoTemplate, embeddingModel,
25                 MongoDBAtlasVectorStore.MongoDBVectorStoreConfig.builder().build(), true);
26     }
27 
28 }

This class initializes the connection to the OpenAI API and configures the MongoDB-based vector store for storing document embeddings.

Embedding the data

For this tutorial, we are using the MongoDB/devcenter-articles dataset, available on Hugging Face. This dataset consists of articles from the MongoDB Developer Center. In our resources, create a directory called docs and add our file to read in.

To embed and store data in the vector store, we’ll use a service that reads documents from a JSON file, converts them into embeddings, and stores them in the MongoDB Atlas vector store. This is done using the DocsLoaderService.java that we will create in a service package:

1 package com.mongodb.RagApp.service;
2 
3 import com.fasterxml.jackson.databind.ObjectMapper;
4 import org.springframework.ai.document.Document;
5 import org.springframework.ai.vectorstore.VectorStore;
6 import org.springframework.beans.factory.annotation.Autowired;
7 import org.springframework.core.io.ClassPathResource;
8 import org.springframework.stereotype.Service;
9 
10 import java.io.BufferedReader;
11 import java.io.InputStream;
12 import java.io.InputStreamReader;
13 import java.util.ArrayList;
14 import java.util.List;
15 import java.util.Map;
16 
17 @Service
18 public class DocsLoaderService {
19 
20     private static final int MAX_TOKENS_PER_CHUNK = 2000; 
21     private final VectorStore vectorStore;
22     private final ObjectMapper objectMapper;
23 
24     @Autowired
25     public DocsLoaderService(VectorStore vectorStore, ObjectMapper objectMapper) {
26         this.vectorStore = vectorStore;
27         this.objectMapper = objectMapper;
28     }
29 
30     public String loadDocs() {
31         try (InputStream inputStream = new ClassPathResource("docs/devcenter-content-snapshot.2024-05-20.json").getInputStream();
32              BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {
33 
34             List<Document> documents = new ArrayList<>();
35             String line;
36 
37             while ((line = reader.readLine()) != null) {
38                 Map<String, Object> jsonDoc = objectMapper.readValue(line, Map.class);
39                 String content = (String) jsonDoc.get("body");
40 
41                 // Split the content into smaller chunks if it exceeds the token limit
42                 List<String> chunks = splitIntoChunks(content, MAX_TOKENS_PER_CHUNK);
43 
44                 // Create a Document for each chunk and add it to the list
45                 for (String chunk : chunks) {
46                     Document document = createDocument(jsonDoc, chunk);
47                     documents.add(document);
48                 }
49                 // Add documents in batches to avoid memory overload
50                 if (documents.size() >= 100) {
51                     vectorStore.add(documents);
52                     documents.clear();
53                 }
54             }
55             if (!documents.isEmpty()) {
56                 vectorStore.add(documents);
57             }
58 
59             return "All documents added successfully!";
60         } catch (Exception e) {
61             return "An error occurred while adding documents: " + e.getMessage();
62         }
63     }
64 
65     private Document createDocument(Map<String, Object> jsonMap, String content) {
66         Map<String, Object> metadata = (Map<String, Object>) jsonMap.get("metadata");
67 
68         metadata.putIfAbsent("sourceName", jsonMap.get("sourceName"));
69         metadata.putIfAbsent("url", jsonMap.get("url"));
70         metadata.putIfAbsent("action", jsonMap.get("action"));
71         metadata.putIfAbsent("format", jsonMap.get("format"));
72         metadata.putIfAbsent("updated", jsonMap.get("updated"));
73 
74         return new Document(content, metadata);
75     }
76 
77     private List<String> splitIntoChunks(String content, int maxTokens) {
78         List<String> chunks = new ArrayList<>();
79         String[] words = content.split("\\s+");
80         StringBuilder chunk = new StringBuilder();
81         int tokenCount = 0;
82 
83         for (String word : words) {
84             // Estimate token count for the word (approximated by character length for simplicity)
85             int wordTokens = word.length() / 4;  // Rough estimate: 1 token = ~4 characters
86             if (tokenCount + wordTokens > maxTokens) {
87                 chunks.add(chunk.toString());
88                 chunk.setLength(0); // Clear the buffer
89                 tokenCount = 0;
90             }
91             chunk.append(word).append(" ");
92             tokenCount += wordTokens;
93         }
94         if (chunk.length() > 0) {
95             chunks.add(chunk.toString());
96         }
97         return chunks;
98     }
99 }

This service reads a JSON file, processes each document, and stores it in MongoDB, along with an embedded vector of our content.

Now, this is a very simplistic approach of chunking (splitting large documents into smaller chunks that stay within the token limit and process them separately) implemented. This is because OpenAI has a token limit, so some of our documents are too large to embed in one go. This is fine for testing, but if you are moving to production, do your research and decide your own best way for dealing with these large documents.

Call this method however you wish, but I created a simple DocsLoaderController in my controller package for testing.

1 import com.mongodb.RagApp.service.DocsLoaderService;
2 import org.springframework.web.bind.annotation.GetMapping;
3 import org.springframework.web.bind.annotation.RequestMapping;
4 import org.springframework.web.bind.annotation.RestController;
5 
6 @RestController
7 @RequestMapping("/api/docs")
8 public class DocsLoaderController {
9 
10     private DocsLoaderService docsLoaderService;
11 
12     public DocsLoaderController(DocsLoaderService docsLoaderService) {
13         this.docsLoaderService = docsLoaderService;
14     }
15 
16     @GetMapping("/load")
17     public String loadDocuments() {
18         return docsLoaderService.loadDocs();
19     }
20 
21 }

Retrieving and augmenting said generation

Once the data is embedded and stored, we can retrieve it through an API that uses a vector search to return relevant results. The RagController class is responsible for this:

1 import org.springframework.ai.chat.client.ChatClient;  
2 import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;  
3 import org.springframework.ai.vectorstore.SearchRequest;  
4 import org.springframework.ai.vectorstore.VectorStore;  
5 import org.springframework.web.bind.annotation.CrossOrigin;  
6 import org.springframework.web.bind.annotation.GetMapping;  
7 import org.springframework.web.bind.annotation.RequestParam;  
8 import org.springframework.web.bind.annotation.RestController;
9 
10 @RestController
11 public class RagController {
12 
13     private final ChatClient chatClient;
14 
15     public RagController(ChatClient.Builder builder, VectorStore vectorStore) {
16         this.chatClient = builder
17                 .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
18                 .build();
19     }
20 
21     @GetMapping("/question")
22     public String question(@RequestParam(value = "message", defaultValue = "How to analyze time-series data with Python and MongoDB?") String message) {
23         return chatClient.prompt()
24                 .user(message)          
25                 .call()                 
26                 .content();
27     }
28 }

There's a little bit going on here. Let's look at the ChatClient. It offers an API for communicating with our AI model.

The AI model processes two types of messages: 1. User messages, which are direct inputs from the user. 2. System messages, which are generated by the system to guide the conversation.

For the system message, we are using the default from the QuestionsAnswerAdvisor:

1 private static final String DEFAULT_USER_TEXT_ADVISE = """
2 			Context information is below.
3 			---------------------
4 			{question_answer_context}
5 			---------------------
6 			Given the context and provided history information and not prior knowledge,
7 			reply to the user comment. If the answer is not in the context, inform
8 			the user that you can't answer the question.
9 			""";

But we could edit this message and tailor it to our needs. There are also prompt options that can be specified, such as the temperature setting that controls the randomness or creativity of the generated output. You can find out more from the Spring documentation.

The /question endpoint allows users to ask questions, and it retrieves answers from the vector store by searching against the embedded documents semantically and sends these to the LLM with our context.

Testing the implementation

To test our implementation:

Start the Spring Boot application.
Navigate to http://localhost:8080/api/docs/load to load documents into the vector store.
Use http://localhost:8080/question?message=Your question here to test the question-answer functionality.

For example, try asking:
http://localhost:8080/question?message=How to analyze time-series data with Python and MongoDB?Explain the steps

We should receive a relevant answer from the RAG app, formed from the embedded document data and the LLM.

Conclusion

In this project, we integrated a retrieval-augmented generation (RAG) system using MongoDB, OpenAI embeddings, and Spring Boot. The system can embed large amounts of document data and answer questions by leveraging vector similarity searches from a MongoDB Atlas vector store.

Next, learn more about what you can do with Java and MongoDB. You might enjoy Seamless Media Storage: Integrating Azure Blob Storage and MongoDB With Spring Boot. Or head over to the community forums and see what other people are doing with MongoDB.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Tutorial

How to Connect to MongoDB With a SOCKS5 Proxy With Java

Aug 29, 2024 | 2 min read

Article

Spring Data Unlocked: Performance Optimization Techniques With MongoDB

Dec 04, 2024 | 5 min read

Article

Streamlining Java Application Development With MongoDB: A Comprehensive Guide to Using Testcontainers

Jul 22, 2024 | 7 min read

Article

How Queryable Encryption Can Keep James Bond Safe

Apr 02, 2024 | 2 min read

What is retrieval-augmented generation?
Prerequisites
Preparing your project
Retrieving and augmenting said generation
Conclusion

Java

Retrieval-Augmented Generation With MongoDB and Spring AI: Bringing AI to Your Java Applications

What is retrieval-augmented generation?

Prerequisites

Preparing your project

Spring Initializr

Configuration

Embedding the data

Retrieving and augmenting said generation

Testing the implementation

Conclusion

Top Comments in Forums

Related

How to Connect to MongoDB With a SOCKS5 Proxy With Java

Spring Data Unlocked: Performance Optimization Techniques With MongoDB

Streamlining Java Application Development With MongoDB: A Comprehensive Guide to Using Testcontainers

How Queryable Encryption Can Keep James Bond Safe

Table of Contents

1	spring.application.name=RagApp
2
3	spring.ai.openai.api-key=<Your-API-Key>
4	spring.ai.openai.chat.options.model=gpt-4o
5
6	spring.ai.vectorstore.mongodb.initialize-schema=true
7
8	spring.data.mongodb.uri=<Your-Connection-URI>
9	spring.data.mongodb.database=rag

1	import org.springframework.ai.embedding.EmbeddingModel;
2	import org.springframework.ai.openai.OpenAiEmbeddingModel;
3	import org.springframework.ai.openai.api.OpenAiApi;
4	import org.springframework.ai.vectorstore.MongoDBAtlasVectorStore;
5	import org.springframework.ai.vectorstore.VectorStore;
6	import org.springframework.beans.factory.annotation.Value;
7	import org.springframework.context.annotation.Bean;
8	import org.springframework.context.annotation.Configuration;
9	import org.springframework.data.mongodb.core.MongoTemplate;
10
11	@Configuration
12	public class Config {
13
14	@Value("${spring.ai.openai.api-key}")
15	private String openAiKey;
16
17	@Bean
18	public EmbeddingModel embeddingModel() {
19	return new OpenAiEmbeddingModel(new OpenAiApi(openAiKey));
20	}
21
22	@Bean
23	public VectorStore mongodbVectorStore(MongoTemplate mongoTemplate, EmbeddingModel embeddingModel) {
24	return new MongoDBAtlasVectorStore(mongoTemplate, embeddingModel,
25	MongoDBAtlasVectorStore.MongoDBVectorStoreConfig.builder().build(), true);
26	}
27
28	}

1	package com.mongodb.RagApp.service;
2
3	import com.fasterxml.jackson.databind.ObjectMapper;
4	import org.springframework.ai.document.Document;
5	import org.springframework.ai.vectorstore.VectorStore;
6	import org.springframework.beans.factory.annotation.Autowired;
7	import org.springframework.core.io.ClassPathResource;
8	import org.springframework.stereotype.Service;
9
10	import java.io.BufferedReader;
11	import java.io.InputStream;
12	import java.io.InputStreamReader;
13	import java.util.ArrayList;
14	import java.util.List;
15	import java.util.Map;
16
17	@Service
18	public class DocsLoaderService {
19
20	private static final int MAX_TOKENS_PER_CHUNK = 2000;
21	private final VectorStore vectorStore;
22	private final ObjectMapper objectMapper;
23
24	@Autowired
25	public DocsLoaderService(VectorStore vectorStore, ObjectMapper objectMapper) {
26	this.vectorStore = vectorStore;
27	this.objectMapper = objectMapper;
28	}
29
30	public String loadDocs() {
31	try (InputStream inputStream = new ClassPathResource("docs/devcenter-content-snapshot.2024-05-20.json").getInputStream();
32	BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) {
33
34	List<Document> documents = new ArrayList<>();
35	String line;
36
37	while ((line = reader.readLine()) != null) {
38	Map<String, Object> jsonDoc = objectMapper.readValue(line, Map.class);
39	String content = (String) jsonDoc.get("body");
40
41	// Split the content into smaller chunks if it exceeds the token limit
42	List<String> chunks = splitIntoChunks(content, MAX_TOKENS_PER_CHUNK);
43
44	// Create a Document for each chunk and add it to the list
45	for (String chunk : chunks) {
46	Document document = createDocument(jsonDoc, chunk);
47	documents.add(document);
48	}
49	// Add documents in batches to avoid memory overload
50	if (documents.size() >= 100) {
51	vectorStore.add(documents);
52	documents.clear();
53	}
54	}
55	if (!documents.isEmpty()) {
56	vectorStore.add(documents);
57	}
58
59	return "All documents added successfully!";
60	} catch (Exception e) {
61	return "An error occurred while adding documents: " + e.getMessage();
62	}
63	}
64
65	private Document createDocument(Map<String, Object> jsonMap, String content) {
66	Map<String, Object> metadata = (Map<String, Object>) jsonMap.get("metadata");
67
68	metadata.putIfAbsent("sourceName", jsonMap.get("sourceName"));
69	metadata.putIfAbsent("url", jsonMap.get("url"));
70	metadata.putIfAbsent("action", jsonMap.get("action"));
71	metadata.putIfAbsent("format", jsonMap.get("format"));
72	metadata.putIfAbsent("updated", jsonMap.get("updated"));
73
74	return new Document(content, metadata);
75	}
76
77	private List<String> splitIntoChunks(String content, int maxTokens) {
78	List<String> chunks = new ArrayList<>();
79	String[] words = content.split("\\s+");
80	StringBuilder chunk = new StringBuilder();
81	int tokenCount = 0;
82
83	for (String word : words) {
84	// Estimate token count for the word (approximated by character length for simplicity)
85	int wordTokens = word.length() / 4; // Rough estimate: 1 token = ~4 characters
86	if (tokenCount + wordTokens > maxTokens) {
87	chunks.add(chunk.toString());
88	chunk.setLength(0); // Clear the buffer
89	tokenCount = 0;
90	}
91	chunk.append(word).append(" ");
92	tokenCount += wordTokens;
93	}
94	if (chunk.length() > 0) {
95	chunks.add(chunk.toString());
96	}
97	return chunks;
98	}
99	}

1	import com.mongodb.RagApp.service.DocsLoaderService;
2	import org.springframework.web.bind.annotation.GetMapping;
3	import org.springframework.web.bind.annotation.RequestMapping;
4	import org.springframework.web.bind.annotation.RestController;
5
6	@RestController
7	@RequestMapping("/api/docs")
8	public class DocsLoaderController {
9
10	private DocsLoaderService docsLoaderService;
11
12	public DocsLoaderController(DocsLoaderService docsLoaderService) {
13	this.docsLoaderService = docsLoaderService;
14	}
15
16	@GetMapping("/load")
17	public String loadDocuments() {
18	return docsLoaderService.loadDocs();
19	}
20
21	}

1	import org.springframework.ai.chat.client.ChatClient;
2	import org.springframework.ai.chat.client.advisor.QuestionAnswerAdvisor;
3	import org.springframework.ai.vectorstore.SearchRequest;
4	import org.springframework.ai.vectorstore.VectorStore;
5	import org.springframework.web.bind.annotation.CrossOrigin;
6	import org.springframework.web.bind.annotation.GetMapping;
7	import org.springframework.web.bind.annotation.RequestParam;
8	import org.springframework.web.bind.annotation.RestController;
9
10	@RestController
11	public class RagController {
12
13	private final ChatClient chatClient;
14
15	public RagController(ChatClient.Builder builder, VectorStore vectorStore) {
16	this.chatClient = builder
17	.defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
18	.build();
19	}
20
21	@GetMapping("/question")
22	public String question(@RequestParam(value = "message", defaultValue = "How to analyze time-series data with Python and MongoDB?") String message) {
23	return chatClient.prompt()
24	.user(message)
25	.call()
26	.content();
27	}
28	}

1	private static final String DEFAULT_USER_TEXT_ADVISE = """
2	Context information is below.
3	---------------------
4	{question_answer_context}
5	---------------------
6	Given the context and provided history information and not prior knowledge,
7	reply to the user comment. If the answer is not in the context, inform
8	the user that you can't answer the question.
9	""";