EventGet 50% off your ticket to MongoDB.local London on October 2. Use code WEB50Learn more >>
MongoDB Developer
Java
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Languageschevron-right
Javachevron-right

Building a Semantic Search Service With Spring AI and MongoDB Atlas

Tim Kelly9 min read • Published Sep 03, 2024 • Updated Sep 03, 2024
SpringAIJava
FULL APPLICATION
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
What is the song that goes, "Duh da, duh da, DUH da duh"? We've all been plagued by this before. We remember a snippet of the chorus, we know it has something to do with a hotel in Chelsea, but what is that song? I can't remember the title — how do you search by vibe?! Well, with the power of AI, we are able to search our databases, not just by matching words, but searching the semantic meaning of the text. And with Spring AI, you can incorporate the AI-powered search into your Spring application. With just the vague memory of a famous woman who prefers handsome men, we can locate our Leonard Cohen classic.
Spring AI is an application framework from Spring that allows you to combine various AI services and plugins with your applications. With support for many chat, text-to-image, and embedding models, you can get your AI-powered Java application set up for a variety of AI use cases.
With Spring AI, MongoDB Atlas is supported as a vector database, all with Atlas Vector Search to power your semantic search and implement your RAG applications. To learn more about RAG and other key concepts in AI, check out the MongoDB AI integration docs.
In this tutorial, we’ll go through what you need to get started with Spring AI and MongoDB, adding documents to your database with the vectorised content (embeddings), and searching this content with semantic search. The full code for this tutorial is available in the GitHub repository.

Prerequisites

Before starting this tutorial, you'll need to have the following:
  • A MongoDB Atlas account and an M10+ cluster running MongoDB version 6.0.11, 7.0.2, or later
    • An M10+ cluster is necessary to create the index programmatically (by Spring AI).
  • An OpenAI API key with a paid OpenAI account and available credits
  • Java 21 and an IDE such as IntelliJ IDEA or Eclipse
  • Maven 3.9.6+ configured for your project

Spring Initializr

Navigate to the Spring Initializr and configure your project with the following settings:
Spring Initializr screenshot showing dependencies described below
  • Project: Maven
  • Language: Java
  • Spring Boot: Default version
  • Java: 21
Add the following dependencies:
  • MongoDB Atlas Vector Database
  • Spring Web
  • OpenAI (other embedding models are available, but we use this for the tutorial)
Generate and download your project, then open it in your IDE.

Setting up your project

Open the application in the IDE of your choosing and the first thing we will do is inspect our pom.xml. In order to use the latest version of Spring AI, change the spring-ai.version version for the Spring AI BOM to 1.0.0-SNAPSHOT. As of writing this article, it will be 1.0.0-M1 by default.

Application configuration

Configure your Spring application to set up the vector store and other necessary beans.
In our application properties, we are going to configure our MongoDB database, as well as everything we need for semantically searching our data. We'll also add in information such as our OpenAI embedding model and API key.
You'll see at the end, we are setting the initialized schema to be true. This means our application will set up our search index (if it doesn't exist) so we can semantically search our data. If you already have a search index set up with this name and configuration, you can set this to be false.
In your IDE, open up your project. Create a Config.java file in a config package. Here, we are going to set up our OpenAI embedding model. Spring AI makes this a very simple process.
Now, we are able to send away our data to be vectorized, and receive the vectorized results.

Model classes

Create a package called model, for our DocumentRequest class to go in. This is what we are going to be storing in our MongoDB database. The content will be what we are embedding — so lyrics, in our case. The metadata will be anything we want to store alongside it, so artists, albums, or genres. This metadata will be returned alongside our content and can also be used to filter our results.

Repository interface

Create a repository package and add a LyricSearchRepository interface. Here, we'll define some of the methods we'll implement later.

Repository implementation

Create a LyricSearchRepositoryImpl class to implement the repository interface.
We are using the methods add, delete, and similaritySearch, all already defined and implemented in Spring AI. These will allow us to embed our data when adding them to our MongoDB database, and we can search these embeddings with vector search.

Service

Create a service package and inside, a LyricSearchService class to handle business logic for our lyrical search application. We will implement these methods later in the tutorial:

Controller

Create a controller package and a LyricSearchController class to handle HTTP requests. We are going to add a call to add our data, a call to delete any documents we no longer need, and a search call, to semantically search our data.
These will call back to the methods we defined earlier. We’ll implement them in the next steps:

Adding documents

In our LyricSearchService class, let's add some logic to take in our documents and add them to our MongoDB database.
This function takes a single parameter, documents, which is a list of DocumentRequest objects. These represent the documents that need to be processed and added to the repository.
The function first checks if the documents list is null or empty.
The documents list is converted into a stream to facilitate functional-style operations.
The filter is a bit of pre-processing to help clean up our data. It removes any DocumentRequest objects that are null, have null content, or have empty (or whitespace-only) content. This ensures that only valid documents are processed further.
Know your limits! The filter removes any Document objects whose content exceeds the maximum token limit (MAX_TOKENS) for the OpenAI API. The token limit is estimated based on word count, assuming one word is slightly more than one token (not far off the truth). This estimation works for the demo, but in production, we would likely want to implement a form of chunking, where large bodies of text are separated into smaller, more digestible pieces.
Each DocumentRequest object is transformed into a Document object. The Document constructor is called with the content and metadata from the DocumentRequest.
The filtered and transformed Document objects are collected into a list and these documents are added to our MongoDB vector store, along with an embedding of the lyrics.
We'll also add our function to delete documents while we're here:
And the appropriate imports:
Now that we have the logic, let’s add the endpoints to our LyricSearchController.
And our imports:
To test our embedding, let's keep it simple with a few nursery rhymes for now.
Build and run your application. Use the following CURL command to add sample documents:

Searching semantically

Let's define our searching method in our LyricSearchService. This is how we will semantically search our documents in our database.
This method take in: - query: A String representing the search query or the text for which you want to find semantically similar lyrics - topK: An int specifying the number of top results to retrieve (i.e., top 10) - similarityThreshold: A double indicating the minimum similarity score a result must have to be included in the results
This returns a list of Map<String, Object> objects. Each map contains the content and metadata of a document that matches the search criteria.
And the imports to our service:
Let's add an endpoint to our controller, and build and run our application.
And the imports:
Use the following CURL command to search your data bases for lyrics about small celestial bodies:
And voila! We have our twinkly little star at the top of our list.

Filter by metadata

In order to filter our data, we need to head over to our index in MongoDB. You can do this through the Atlas UI by selecting the collection where your data is stored and going to the search indexes. You can edit this index by selecting the three dots on the right of the index name and we will add our filter for the artist.
Let's head back to our LyricSearchService and add a method with an artist parameter so we can filter our results.
And the imports we'll need:
And lastly, an endpoint in our controller:
Now, we are able to not only search as before, but we can say we want to restrict it to only specific artists.
Use the following CURL command to try a semantic search with metadata filtering:
Unlike before, and even asking for the top five results, we are only returned the one document because we only have one document from the artist Jane Taylor. Hooray!

Conclusion

You now have a Spring application that allows you to search through your data by performing semantic searches. This is an important step when you are looking to implement your RAG applications, or just an AI-enhanced search feature in your applications.
If you want to learn more about the MongoDB Spring AI integration, follow along with the quick-start Get Started With the Spring AI Integration, and if you have any questions or want to show us what you are building, join us in the MongoDB Community Forums.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Java - MongoDB Multi-Document ACID Transactions


Mar 01, 2024 | 10 min read
Tutorial

Seamless Media Storage: Integrating Azure Blob Storage and MongoDB With Spring Boot


Aug 01, 2024 | 9 min read
Tutorial

Integrating MongoDB with Amazon Managed Streaming for Apache Kafka (MSK)


Jun 12, 2023 | 7 min read
Tutorial

Using Azure Kubernetes Services for Java Spring Boot Microservices


Apr 15, 2024 | 9 min read
Table of Contents