Building a Semantic Search Service With Spring AI and MongoDB Atlas

Tim Kelly9 min read • Published Sep 03, 2024 • Updated Sep 03, 2024

Spring AI Java

FULL APPLICATION

Rate this tutorial

What is the song that goes, "Duh da, duh da, DUH da duh"? We've all been plagued by this before. We remember a snippet of the chorus, we know it has something to do with a hotel in Chelsea, but what is that song? I can't remember the title — how do you search by vibe?! Well, with the power of AI, we are able to search our databases, not just by matching words, but searching the semantic meaning of the text. And with Spring AI, you can incorporate the AI-powered search into your Spring application. With just the vague memory of a famous woman who prefers handsome men, we can locate our Leonard Cohen classic.

Spring AI is an application framework from Spring that allows you to combine various AI services and plugins with your applications. With support for many chat, text-to-image, and embedding models, you can get your AI-powered Java application set up for a variety of AI use cases.

With Spring AI, MongoDB Atlas is supported as a vector database, all with Atlas Vector Search to power your semantic search and implement your RAG applications. To learn more about RAG and other key concepts in AI, check out the MongoDB AI integration docs.

In this tutorial, we’ll go through what you need to get started with Spring AI and MongoDB, adding documents to your database with the vectorised content (embeddings), and searching this content with semantic search. The full code for this tutorial is available in the GitHub repository.

Prerequisites

Before starting this tutorial, you'll need to have the following:

A MongoDB Atlas account and an M10+ cluster running MongoDB version 6.0.11, 7.0.2, or later
- An M10+ cluster is necessary to create the index programmatically (by Spring AI).
An OpenAI API key with a paid OpenAI account and available credits
Java 21 and an IDE such as IntelliJ IDEA or Eclipse
Maven 3.9.6+ configured for your project

Spring Initializr

Navigate to the Spring Initializr and configure your project with the following settings:

Project: Maven
Language: Java
Spring Boot: Default version
Java: 21

Add the following dependencies:

MongoDB Atlas Vector Database
Spring Web
OpenAI (other embedding models are available, but we use this for the tutorial)

Generate and download your project, then open it in your IDE.

Setting up your project

Open the application in the IDE of your choosing and the first thing we will do is inspect our pom.xml. In order to use the latest version of Spring AI, change the spring-ai.version version for the Spring AI BOM to 1.0.0-SNAPSHOT. As of writing this article, it will be 1.0.0-M1 by default.

Application configuration

Configure your Spring application to set up the vector store and other necessary beans.

In our application properties, we are going to configure our MongoDB database, as well as everything we need for semantically searching our data. We'll also add in information such as our OpenAI embedding model and API key.

Code Snippet

You'll see at the end, we are setting the initialized schema to be true. This means our application will set up our search index (if it doesn't exist) so we can semantically search our data. If you already have a search index set up with this name and configuration, you can set this to be false.

In your IDE, open up your project. Create a Config.java file in a config package. Here, we are going to set up our OpenAI embedding model. Spring AI makes this a very simple process.

Code Snippet

Now, we are able to send away our data to be vectorized, and receive the vectorized results.

Model classes

Create a package called model, for our DocumentRequest class to go in. This is what we are going to be storing in our MongoDB database. The content will be what we are embedding — so lyrics, in our case. The metadata will be anything we want to store alongside it, so artists, albums, or genres. This metadata will be returned alongside our content and can also be used to filter our results.

Code Snippet

Repository interface

Create a repository package and add a LyricSearchRepository interface. Here, we'll define some of the methods we'll implement later.

Code Snippet

Repository implementation

Create a LyricSearchRepositoryImpl class to implement the repository interface.

Code Snippet

package com.mongodb.lyric_semantic_search.repository;

import java.util.List;
import java.util.Optional;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Repository;

@Repository
public class LyricSearchRepositoryImpl implements LyricSearchRepository {

private final VectorStore vectorStore;

@Autowired
    public LyricSearchRepositoryImpl(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

@Override
    public void addDocuments(List<Document> docs) {
        vectorStore.add(docs);
    }

@Override
    public Optional<Boolean> deleteDocuments(List<String> ids) {
        return vectorStore.delete(ids);
    }

@Override
    public List<Document> semanticSearchByLyrics(SearchRequest searchRequest) {
        return vectorStore.similaritySearch(searchRequest);
    }
}

We are using the methods add, delete, and similaritySearch, all already defined and implemented in Spring AI. These will allow us to embed our data when adding them to our MongoDB database, and we can search these embeddings with vector search.

Service

Create a service package and inside, a LyricSearchService class to handle business logic for our lyrical search application. We will implement these methods later in the tutorial:

Code Snippet

Controller

Create a controller package and a LyricSearchController class to handle HTTP requests. We are going to add a call to add our data, a call to delete any documents we no longer need, and a search call, to semantically search our data.

These will call back to the methods we defined earlier. We’ll implement them in the next steps:

Code Snippet

Adding documents

In our LyricSearchService class, let's add some logic to take in our documents and add them to our MongoDB database.

Code Snippet

private static final int MAX_TOKENS = (int) (8192 * 0.80); // OpenAI model's maximum content length + BUFFER for when one word > 1 token

@Autowired
    LyricSearchRepository lyricSearchRepository;

public List<Document> addDocuments(List<DocumentRequest> documents) {
        if (documents == null || documents.isEmpty()) {
            return Collections.emptyList();
        }

List<Document> docs = documents.stream()
            .filter(doc -> doc != null && doc.getContent() != null && !doc.getContent()
                .trim()
                .isEmpty())
            .map(doc -> new Document(doc.getContent(), doc.getMetadata()))
            .filter(doc -> {
                int wordCount = doc.getContent()
                    .split("\\s+").length;
                return wordCount <= MAX_TOKENS;
            })
            .collect(Collectors.toList());

if (!docs.isEmpty()) {
            lyricSearchRepository.addDocuments(docs);
        }

return docs;
    }

This function takes a single parameter, documents, which is a list of DocumentRequest objects. These represent the documents that need to be processed and added to the repository.

The function first checks if the documents list is null or empty.

The documents list is converted into a stream to facilitate functional-style operations.

The filter is a bit of pre-processing to help clean up our data. It removes any DocumentRequest objects that are null, have null content, or have empty (or whitespace-only) content. This ensures that only valid documents are processed further.

Know your limits! The filter removes any Document objects whose content exceeds the maximum token limit (MAX_TOKENS) for the OpenAI API. The token limit is estimated based on word count, assuming one word is slightly more than one token (not far off the truth). This estimation works for the demo, but in production, we would likely want to implement a form of chunking, where large bodies of text are separated into smaller, more digestible pieces.

Each DocumentRequest object is transformed into a Document object. The Document constructor is called with the content and metadata from the DocumentRequest.

The filtered and transformed Document objects are collected into a list and these documents are added to our MongoDB vector store, along with an embedding of the lyrics.

We'll also add our function to delete documents while we're here:

Code Snippet

And the appropriate imports:

Code Snippet

Now that we have the logic, let’s add the endpoints to our LyricSearchController.

Code Snippet

And our imports:

Code Snippet

To test our embedding, let's keep it simple with a few nursery rhymes for now.

Build and run your application. Use the following CURL command to add sample documents:

Code Snippet

Searching semantically

Let's define our searching method in our LyricSearchService. This is how we will semantically search our documents in our database.

Code Snippet

This method take in: - query: A String representing the search query or the text for which you want to find semantically similar lyrics - topK: An int specifying the number of top results to retrieve (i.e., top 10) - similarityThreshold: A double indicating the minimum similarity score a result must have to be included in the results

This returns a list of Map<String, Object> objects. Each map contains the content and metadata of a document that matches the search criteria.

And the imports to our service:

Code Snippet

Let's add an endpoint to our controller, and build and run our application.

Code Snippet

And the imports:

Code Snippet

Use the following CURL command to search your data bases for lyrics about small celestial bodies:

Code Snippet

And voila! We have our twinkly little star at the top of our list.

Code Snippet

Filter by metadata

In order to filter our data, we need to head over to our index in MongoDB. You can do this through the Atlas UI by selecting the collection where your data is stored and going to the search indexes. You can edit this index by selecting the three dots on the right of the index name and we will add our filter for the artist.

Code Snippet

Let's head back to our LyricSearchService and add a method with an artist parameter so we can filter our results.

Code Snippet

And the imports we'll need:

Code Snippet

And lastly, an endpoint in our controller:

Code Snippet

Now, we are able to not only search as before, but we can say we want to restrict it to only specific artists.

Use the following CURL command to try a semantic search with metadata filtering:

Code Snippet

Unlike before, and even asking for the top five results, we are only returned the one document because we only have one document from the artist Jane Taylor. Hooray!

Code Snippet

Conclusion

You now have a Spring application that allows you to search through your data by performing semantic searches. This is an important step when you are looking to implement your RAG applications, or just an AI-enhanced search feature in your applications.

If you want to learn more about the MongoDB Spring AI integration, follow along with the quick-start Get Started With the Spring AI Integration, and if you have any questions or want to show us what you are building, join us in the MongoDB Community Forums.

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Quickstart

Java - MongoDB Multi-Document ACID Transactions

Mar 01, 2024 | 10 min read

Tutorial

Seamless Media Storage: Integrating Azure Blob Storage and MongoDB With Spring Boot

Aug 01, 2024 | 9 min read

Tutorial

Integrating MongoDB with Amazon Managed Streaming for Apache Kafka (MSK)

Jun 12, 2023 | 7 min read

Tutorial

Using Azure Kubernetes Services for Java Spring Boot Microservices

Apr 15, 2024 | 9 min read

Prerequisites
Spring Initializr
Setting up your project
Adding documents
Searching semantically
Conclusion

Java

Building a Semantic Search Service With Spring AI and MongoDB Atlas

Prerequisites

Spring Initializr

Setting up your project

Application configuration

Model classes

Repository interface

Repository implementation

Service

Controller

Adding documents

Searching semantically

Filter by metadata

Conclusion

Top Comments in Forums

Related

Java - MongoDB Multi-Document ACID Transactions

Seamless Media Storage: Integrating Azure Blob Storage and MongoDB With Spring Boot

Integrating MongoDB with Amazon Managed Streaming for Apache Kafka (MSK)

Using Azure Kubernetes Services for Java Spring Boot Microservices

Table of Contents