Get Started with the LangChain4j Integration
On this page
You can integrate Atlas Vector Search with LangChain4j to build LLM applications. This tutorial demonstrates how to start using Atlas Vector Search with LangChain4j to perform semantic searches on your data and build a simple RAG implementation. Specifically, you perform the following actions:
Set up the environment.
Instantiate the embedding model.
Use Atlas as an embedding store.
Store custom data on Atlas.
Run the following vector search queries:
Semantic search.
Semantic search with metadata pre-filtering.
Implement RAG by using Atlas Vector Search to answer questions on your data.
Background
LangChain4j is a framework that simplifies the creation of LLM applications in Java. LangChain4j combines concepts and functionality from LangChain, Haystack, LlamaIndex, and other sources. You can use this framework for a variety of use cases, including semantic search and RAG.
By integrating Atlas Vector Search with LangChain4j, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by using semantically similar documents to answer queries. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.
Prerequisites
To complete this tutorial, you must have the following:
An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.
A Voyage AI API Key. You must have an account with tokens available for API requests. To learn more about registering a Voyage AI account, see the Voyage AI website.
An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.
Java Development Kit (JDK) version 8 or later.
An environment to set up and run a Java application. We recommend that you use an integrated development environment (IDE) such as IntelliJ IDEA or Eclipse IDE to configure Maven or Gradle to build and run your project.
Set Up the Environment
You must first set up the environment for this tutorial, which includes adding the necessary dependencies and setting environment variables.
Create a new Java application.
Open your IDE and create a new Java project and set the following configurations:
Name: LangChain4jSampleApp
Language: Java
Build system: Maven
JDK: Any version greater than
8
You might see an option to include sample code. Selecting this option might help you test that your environment works and locate the application file that you edit in the following steps.
Add dependencies.
Add the following dependencies to the
dependencies
array in your project'spom.xml
file. These dependencies add LangChain4j, Voyage AI API for LangChain4j, and MongoDB Java Sync Driver libraries to your application:pom.xml<dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-mongodb-atlas</artifactId> <version>1.0.0-beta1</version> </dependency> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-voyage-ai</artifactId> <version>1.0.0-beta1</version> </dependency> <dependency> <groupId>org.mongodb</groupId> <artifactId>mongodb-driver-sync</artifactId> <version>5.4.0</version> </dependency> Next, add a
dependencyManagement
entry below your dependency list for the LangChain4j Bill of Materials (BOM):pom.xml<dependencyManagement> <dependencies> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-bom</artifactId> <version>1.0.0-beta1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> To learn more about the LangChain4j BOM, see the Get Started page in the LangChain4j documentation.
After you finish editing the
pom.xml
file, reload your project to make sure your dependencies are installed.
Import classes and methods.
Locate the main application file Main.java
in your project.
Replace any existing imports with the following list of imports:
import com.mongodb.client.MongoClient; import com.mongodb.client.MongoClients; import dev.langchain4j.data.document.Metadata; import dev.langchain4j.data.embedding.Embedding; import dev.langchain4j.data.segment.TextSegment; import dev.langchain4j.model.embedding.EmbeddingModel; import dev.langchain4j.model.voyageai.VoyageAiEmbeddingModel; import dev.langchain4j.store.embedding.EmbeddingMatch; import dev.langchain4j.store.embedding.EmbeddingSearchRequest; import dev.langchain4j.store.embedding.EmbeddingSearchResult; import dev.langchain4j.store.embedding.filter.comparison.*; import dev.langchain4j.store.embedding.mongodb.IndexMapping; import dev.langchain4j.store.embedding.mongodb.MongoDbEmbeddingStore; import org.bson.Document; import java.io.*; import java.util.*;
Later in this tutorial, you use these classes and methods to create vector embeddings and query data.
Set environment variables.
Depending on your IDE, there might be multiple ways to set environment variables that your application can retrieve. To set environment variables in IntelliJ, you must create a run configuration for your application. To learn more, see the Operating system section of the Run/debug configuration: Application page in the IntelliJ documentation.
Set the following environment variables:
MONGODB_URI
: Set to your Atlas connection string.VOYAGE_AI_KEY
: Set to your Voyage AI API key.
Note
Your connection string should use the following format:
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net/?<settings>
To learn more about retrieving your connection string, see the Get Started with Atlas tutorial.
Instantiate the Embedding Model
In this step, you instantiate an embedding model that uses Voyage AI to convert text in sample data into vector embeddings.
Add the following code to your Main.java
file to instantiate the
embedding model by using your Voyage AI API key and selecting
voyage-3
as the model:
EmbeddingModel embeddingModel = VoyageAiEmbeddingModel.builder() .apiKey(embeddingApiKey) .modelName("voyage-3") .build();
To learn more about the voyage-3
model, see the blog post about
voyage-3 & voyage-3-lite on the Voyage AI
website.
Use Atlas as an Embedding Store
In this section, you instantiate Atlas as a vector database, also called a vector or embedding store. When you instantiate the embedding store, LangChain4j automatically creates an Atlas Vector Search index on your data.
Note
Required Access
To create an Atlas Vector Search index, you must have Project Data Access Admin
or higher access to the Atlas project.
This code performs the following actions:
Creates a
MongoClient
instance that is connected to your Atlas deployment.Sets the number of dimensions in the vector search index definition to the embedding dimension of the AI model. The resulting vector search index has the following definition:
{ "fields": [ { "type": "vector", "path": "embedding", "numDimensions": 1024, "similarity": "cosine" } ] } Configures your Atlas collection by specifying the following parameters:
langchain4j_test.vector_store
as the Atlas collection to store the documents.vector_index
as the index to use for querying the embedding store.
Because the createIndex
boolean is set to true
, instantiating
the embedding store automatically creates the vector search index. The code
includes a delay to allow for successful index creation.
Add the following code into your Main.java
file:
MongoClient mongoClient = MongoClients.create(uri); System.out.println("Instantiating the embedding store..."); // Set to false if the vector index already exists Boolean createIndex = true; IndexMapping indexMapping = IndexMapping.builder() .dimension(embeddingModel.dimension()) .metadataFieldNames(new HashSet<>()) .build(); MongoDbEmbeddingStore embeddingStore = MongoDbEmbeddingStore.builder() .databaseName("search") .collectionName("langchaintest") .createIndex(createIndex) .indexName("vector_index") .indexMapping(indexMapping) .fromClient(mongoClient) .build(); if (createIndex) { // Creating a vector search index can take up to a minute, // so this delay allows the index to become queryable try { Thread.sleep(15000); } catch (InterruptedException e) { throw new RuntimeException(e); } }
To learn more about the classes and methods used in the preceding code, see the dev.langchain4j.store.embedding.mongodb package API documentation.
Store Custom Data
In this section, you create sample documents, use the embedding model to convert the text to embeddings, and persist the data to Atlas.
This code performs the following actions:
Creates a list of sample documents that includes
text
andmetadata
fields.Converts the content of the
text
field to embeddings and persists the data to Atlas. The code includes a delay to accommodate the time needed for the vector conversion.
Add the following code into your Main.java
file:
ArrayList<Document> docs = new ArrayList<>(); docs.add(new Document() .append("text", "In Zadie Smith's new novel, the true story of a heated nineteenth-century criminal trial connects to the unrest of current times.") .append("metadata", new Metadata(Map.of("author", "A")))); docs.add(new Document() .append("text", "Emperor penguins are the tallest and heaviest of all penguin species, standing up to 4 feet.") .append("metadata", new Metadata(Map.of("author", "D")))); docs.add(new Document() .append("text", "Penguins are flightless seabirds that live almost exclusively below the equator. Some island-dwellers can be found in warmer climates.") .append("metadata", new Metadata(Map.of("author", "C")))); docs.add(new Document() .append("text", "Patagonia is home to five penguin species - Magellanic, Humboldt, Gentoo, Southern Rockhopper and King.") .append("metadata", new Metadata(Map.of("author", "B")))); System.out.println("Persisting document embeddings..."); for (Document doc : docs) { TextSegment segment = TextSegment.from( doc.getString("text"), doc.get("metadata", Metadata.class) ); Embedding embedding = embeddingModel.embed(segment).content(); embeddingStore.add(embedding, segment); } // Delay for persisting data try { Thread.sleep(5000); } catch (InterruptedException e) { throw new RuntimeException(e); }
Run Vector Search Queries
This section demonstrates how to run queries on your vectorized data.
Perform a semantic search.
This code performs a semantic search query for the phrase
"Where do penguins live?"
and returns the three
most relevant results. It also prints a score that captures how well
each result matches the query.
Add the following code to your Main.java
file:
String query = "Where do penguins live?"; Embedding queryEmbedding = embeddingModel.embed(query).content(); EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder() .queryEmbedding(queryEmbedding) .maxResults(3) .build(); System.out.println("Performing the query..."); EmbeddingSearchResult<TextSegment> searchResult = embeddingStore.search(searchRequest); List<EmbeddingMatch<TextSegment>> matches = searchResult.matches(); for (EmbeddingMatch<TextSegment> embeddingMatch : matches) { System.out.println("Response: " + embeddingMatch.embedded().text()); System.out.println("Author: " + embeddingMatch.embedded().metadata().getString("author")); System.out.println("Score: " + embeddingMatch.score()); }
Response: Penguins are flightless seabirds that live almost exclusively below the equator. Some island-dwellers can be found in warmer climates. Author: C Score: 0.829620897769928 Response: Patagonia is home to five penguin species - Magellanic, Humboldt, Gentoo, Southern Rockhopper and King. Author: B Score: 0.7459062337875366 Response: Emperor penguins are the tallest and heaviest of all penguin species, standing up to 4 feet. Author: D Score: 0.6908764839172363
(Optional) Perform a semantic search with metadata filtering.
To perform a search with metadata filtering, you can use classes
from the dev.langchain4j.store.embedding.filter.comparison
package. These classes allow you to create filters that compare
metadata values to specified values to narrow the results returned
by the search.
This example filters for documents in which the value of the
author
field is either "B"
or "C"
. Then, it performs a
semantic search query for the phrase "Where do penguins live?"
.
Replace the code that instantiates an EmbeddingSearchRequest
instance in the preceding step with the following code:
EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder() .queryEmbedding(queryEmbedding) .filter(new IsIn("author", List.of("B", "C"))) .maxResults(3) .build();
Response: Penguins are flightless seabirds that live almost exclusively below the equator. Some island-dwellers can be found in warmer climates. Author: C Score: 0.8520907163619995 Response: Patagonia is home to five penguin species - Magellanic, Humboldt, Gentoo, Southern Rockhopper and King. Author: B Score: 0.7666836977005005
To learn more about metadata pre-filtering, see Atlas Vector Search Pre-Filter.
See also:
For more information, refer to the API reference.
Use Your Data to Answer Questions
This section demonstrates a RAG implementation that uses the LangChain4j framework and Atlas Vector Search. Now that you've used Atlas Vector Search to retrieve semantically similar documents, use the following code examples to prompt the LLM to answer questions by using information from documents stored in Atlas.
Set up your project for RAG.
Add the following dependencies to the
dependencies
array in your project'spom.xml
file, but do not remove any of the dependencies you already added. These dependencies add the LangChain4j AI services and OpenAI API for LangChain4j libraries to your application:pom.xml<dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j-open-ai</artifactId> <version>1.0.0-beta1</version> </dependency> <dependency> <groupId>dev.langchain4j</groupId> <artifactId>langchain4j</artifactId> <version>1.0.0-beta1</version> </dependency> After you finish editing the
pom.xml
file, reload your project to make sure your dependencies are installed.Add the following imports to your imports list in your
Main.java
file:import com.fasterxml.jackson.core.type.TypeReference; import com.fasterxml.jackson.databind.ObjectMapper; import dev.langchain4j.service.AiServices; import dev.langchain4j.model.chat.ChatLanguageModel; import dev.langchain4j.model.openai.OpenAiChatModel; import dev.langchain4j.rag.content.retriever.ContentRetriever; import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever; Set the
OPENAI_KEY
environment variable to your OpenAI API key. You use this key to create a chat model that generates a response to your query.
Load the sample data.
In this step, you ingest data from an external source into
Atlas. Download the rainforest-docs.json
sample data file from the docs-code-examples
GitHub
repository. The documents in this file contain information about
plants, animals, and weather in the rainforest.
Upload this file to the resources
directory in your project,
which is at the same level as the java
directory that contains
your application files.
You must process the data into a usable format that you can create
embeddings from and persist to Atlas. This code defines the
loadJsonDocuments()
method that performs the following actions:
Retrieves the sample data from your
resources
directory by using theClassLoader
classParses the JSON documents to a
List
of MongoDBDocument
instances by using theObjectMapper
class
Add the following code to your Main.java
file outside of
your main method:
private static List<Document> loadJsonDocuments(String resourcePath) throws IOException { // Loads file from resources directory using the ClassLoader InputStream inputStream = Main.class.getClassLoader().getResourceAsStream(resourcePath); if (inputStream == null) { throw new FileNotFoundException("Resource not found: " + resourcePath); } // Parses JSON file to List of MongoDB Documents ObjectMapper objectMapper = new ObjectMapper(); List<Document> documents = objectMapper.readValue(inputStream, new TypeReference<>() {}); return documents; }
Then, add the following code in the main method body to call the
loadJsonDocuments()
method and load your documents:
System.out.println("Loading documents from file..."); String resourcePath = "rainforest-docs.json"; List<Document> documents = loadJsonDocuments(resourcePath);
Store vector embeddings in Atlas.
In this step, you create vector embeddings from your sample documents and persist them to Atlas.
This code converts the content of the text
fields in the
sample documents to embeddings and persists the data to Atlas.
The code includes a delay to accommodate the time needed for the
vector conversion.
Add the following code into your Main.java
file:
System.out.println("Persisting document embeddings..."); for (Document doc : documents) { TextSegment segment = TextSegment.from( doc.getString("text"), new Metadata(doc.get("metadata", Map.class))); Embedding embedding = embeddingModel.embed(segment).content(); embeddingStore.add(embedding, segment); } try { Thread.sleep(5000); } catch (InterruptedException e) { throw new RuntimeException(e); }
Instantiate the chat model and specify the content retriever.
In this step, you instantiate a chat model from OpenAI so you can answer questions based on your data. You also specify a content retriever that surfaces relevant documents to inform the response crafted by the chat model.
This code performs the following actions:
Instantiates the chat model by using your OpenAI API key
Creates the content retriever with the following specifications:
Retrieves at most
3
relevant documentsRetrieves documents that have a relevance score of at least
0.75
Add the following code to your Main.java
file in the main
method body:
String chatApiKey = System.getenv("OPENAI_KEY"); ChatLanguageModel chatModel = OpenAiChatModel.builder() .apiKey(chatApiKey) .modelName("gpt-4") .build(); ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .maxResults(3) .minScore(0.75) .build();
Tip
Metadata Filtering
You can implement metadata filtering in your ContentRetriever
by using the filter()
builder method and passing an instance of
a Filter
. See the metadata filtering example
in the preceding step to learn how to construct a Filter
.
Create the chat assistant.
Create a simple Assistant
interface that implements the AI
Services API in your application. Create an interface file called
Assistant.java
at the same level as your Main.java
file.
Define the Assistant
interface:
package org.example; public interface Assistant { String answer(String question); }
In your Main.java
file, instantiate the Assistant
:
Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel(chatModel) .contentRetriever(contentRetriever) .build();
Perform queries on your data.
Finally, perform a query on your sample data. Add the following
code to your Main.java
file to run a query and
print the output:
String ragQuery = "What types of insects live in the rainforest?"; String output = assistant.answer(ragQuery); System.out.println("Response:\n" + output);
Response: In the rainforest, there are numerous species of insects such as beetles, butterflies, moths, wasps, bees, flies, and ants. Of the many insects that live in the rainforest, ants are particularly important as they play a crucial role in nutrient recycling and aeration of the soil. Moreover, many of these insects are involved in the processes of pollination and decomposition. The adaptations these insects have developed enable their survival in the rainforest's specific conditions, characterized by heavy rainfall.
Next Steps
MongoDB also provides the following developer resources:
How to Make a RAG Application With LangChain4j tutorial on the DEV Community website
See also: