Get Started with the LangChain4j Integration

On this page

Background
Prerequisites
Set Up the Environment
Instantiate the Embedding Model
Use Atlas as an Embedding Store
Store Custom Data
Run Vector Search Queries
Use Your Data to Answer Questions
Next Steps

You can integrate Atlas Vector Search with LangChain4j to build LLM applications. This tutorial demonstrates how to start using Atlas Vector Search with LangChain4j to perform semantic searches on your data and build a simple RAG implementation. Specifically, you perform the following actions:

Set up the environment.
Instantiate the embedding model.
Use Atlas as an embedding store.
Store custom data on Atlas.
Run the following vector search queries:
- Semantic search.
- Semantic search with metadata pre-filtering.
Implement RAG by using Atlas Vector Search to answer questions on your data.

Background

LangChain4j is a framework that simplifies the creation of LLM applications in Java. LangChain4j combines concepts and functionality from LangChain, Haystack, LlamaIndex, and other sources. You can use this framework for a variety of use cases, including semantic search and RAG.

By integrating Atlas Vector Search with LangChain4j, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by using semantically similar documents to answer queries. To learn more about RAG, see Retrieval-Augmented Generation (RAG) with Atlas Vector Search.

Prerequisites

To complete this tutorial, you must have the following:

An Atlas account with a cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs). Ensure that your IP address is included in your Atlas project's access list. To learn more, see Create a Cluster.
A Voyage AI API Key. You must have an account with tokens available for API requests. To learn more about registering a Voyage AI account, see the Voyage AI website.
An OpenAI API Key. You must have an OpenAI account with credits available for API requests. To learn more about registering an OpenAI account, see the OpenAI API website.

Java Development Kit (JDK) version 8 or later.
An environment to set up and run a Java application. We recommend that you use an integrated development environment (IDE) such as IntelliJ IDEA or Eclipse IDE to configure Maven or Gradle to build and run your project.

Set Up the Environment

You must first set up the environment for this tutorial, which includes adding the necessary dependencies and setting environment variables.

Create a new Java application.

Open your IDE and create a new Java project and set the following configurations:
- Name: LangChain4jSampleApp
- Language: Java
- Build system: Maven
- JDK: Any version greater than 8
You might see an option to include sample code. Selecting this option might help you test that your environment works and locate the application file that you edit in the following steps.

Add dependencies.

Add the following dependencies to the dependencies array in your project's pom.xml file. These dependencies add LangChain4j, Voyage AI API for LangChain4j, and MongoDB Java Sync Driver libraries to your application:

pom.xml

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-mongodb-atlas</artifactId>
    <version>1.0.0-beta1</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-voyage-ai</artifactId>
    <version>1.0.0-beta1</version>
</dependency>
<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongodb-driver-sync</artifactId>
    <version>5.4.0</version>
</dependency>

Next, add a dependencyManagement entry below your dependency list for the LangChain4j Bill of Materials (BOM):

pom.xml

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>dev.langchain4j</groupId>
            <artifactId>langchain4j-bom</artifactId>
            <version>1.0.0-beta1</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>

To learn more about the LangChain4j BOM, see the Get Started page in the LangChain4j documentation.

After you finish editing the pom.xml file, reload your project to make sure your dependencies are installed.

Import classes and methods.

Locate the main application file Main.java in your project. Replace any existing imports with the following list of imports:

import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import dev.langchain4j.data.document.Metadata;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.voyageai.VoyageAiEmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingMatch;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import dev.langchain4j.store.embedding.filter.comparison.*;
import dev.langchain4j.store.embedding.mongodb.IndexMapping;
import dev.langchain4j.store.embedding.mongodb.MongoDbEmbeddingStore;
import org.bson.Document;
import java.io.*;
import java.util.*;

Later in this tutorial, you use these classes and methods to create vector embeddings and query data.

Set environment variables.

Depending on your IDE, there might be multiple ways to set environment variables that your application can retrieve. To set environment variables in IntelliJ, you must create a run configuration for your application. To learn more, see the Operating system section of the Run/debug configuration: Application page in the IntelliJ documentation.

Set the following environment variables:

MONGODB_URI: Set to your Atlas connection string.
VOYAGE_AI_KEY: Set to your Voyage AI API key.

Note

Your connection string should use the following format:

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net/?<settings>

To learn more about retrieving your connection string, see the Get Started with Atlas tutorial.

Retrieve environment variables.

Retrieve your environment variables by adding the following code inside the main method in your application's Main.java file:

String embeddingApiKey = System.getenv("VOYAGE_AI_KEY");
String uri = System.getenv("MONGODB_URI");

Instantiate the Embedding Model

In this step, you instantiate an embedding model that uses Voyage AI to convert text in sample data into vector embeddings.

Add the following code to your Main.java file to instantiate the embedding model by using your Voyage AI API key and selecting voyage-3 as the model:

EmbeddingModel embeddingModel = VoyageAiEmbeddingModel.builder()
        .apiKey(embeddingApiKey)
        .modelName("voyage-3")
        .build();

To learn more about the voyage-3 model, see the blog post about voyage-3 & voyage-3-lite on the Voyage AI website.

Use Atlas as an Embedding Store

In this section, you instantiate Atlas as a vector database, also called a vector or embedding store. When you instantiate the embedding store, LangChain4j automatically creates an Atlas Vector Search index on your data.

Note

Required Access

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the Atlas project.

This code performs the following actions:

Creates a MongoClient instance that is connected to your Atlas deployment.
Sets the number of dimensions in the vector search index definition to the embedding dimension of the AI model. The resulting vector search index has the following definition:
```
{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1024,
      "similarity": "cosine"
    }
  ]
}
```
Configures your Atlas collection by specifying the following parameters:
- langchain4j_test.vector_store as the Atlas collection to store the documents.
- vector_index as the index to use for querying the embedding store.

Because the createIndex boolean is set to true, instantiating the embedding store automatically creates the vector search index. The code includes a delay to allow for successful index creation.

Add the following code into your Main.java file:

MongoClient mongoClient = MongoClients.create(uri);
System.out.println("Instantiating the embedding store...");
// Set to false if the vector index already exists
Boolean createIndex = true;
IndexMapping indexMapping = IndexMapping.builder()
        .dimension(embeddingModel.dimension())
        .metadataFieldNames(new HashSet<>())
        .build();
MongoDbEmbeddingStore embeddingStore = MongoDbEmbeddingStore.builder()
        .databaseName("search")
        .collectionName("langchaintest")
        .createIndex(createIndex)
        .indexName("vector_index")
        .indexMapping(indexMapping)
        .fromClient(mongoClient)
        .build();
if (createIndex) {
    // Creating a vector search index can take up to a minute,
    // so this delay allows the index to become queryable
    try {
        Thread.sleep(15000);
    } catch (InterruptedException e) {
        throw new RuntimeException(e);
    }
}

To learn more about the classes and methods used in the preceding code, see the dev.langchain4j.store.embedding.mongodb package API documentation.

Store Custom Data

In this section, you create sample documents, use the embedding model to convert the text to embeddings, and persist the data to Atlas.

This code performs the following actions:

Creates a list of sample documents that includes text and metadata fields.
Converts the content of the text field to embeddings and persists the data to Atlas. The code includes a delay to accommodate the time needed for the vector conversion.

Add the following code into your Main.java file:

ArrayList<Document> docs = new ArrayList<>();
docs.add(new Document()
        .append("text", "In Zadie Smith's new novel, the true story of a heated nineteenth-century criminal trial connects to the unrest of current times.")
        .append("metadata", new Metadata(Map.of("author", "A"))));
docs.add(new Document()
        .append("text", "Emperor penguins are the tallest and heaviest of all penguin species, standing up to 4 feet.")
        .append("metadata", new Metadata(Map.of("author", "D"))));
docs.add(new Document()
        .append("text", "Penguins are flightless seabirds that live almost exclusively below the equator. Some island-dwellers can be found in warmer climates.")
        .append("metadata", new Metadata(Map.of("author", "C"))));
docs.add(new Document()
        .append("text", "Patagonia is home to five penguin species - Magellanic, Humboldt, Gentoo, Southern Rockhopper and King.")
        .append("metadata", new Metadata(Map.of("author", "B"))));
System.out.println("Persisting document embeddings...");
for (Document doc : docs) {
    TextSegment segment = TextSegment.from(
            doc.getString("text"),
            doc.get("metadata", Metadata.class)
    );
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
}
// Delay for persisting data
try {
    Thread.sleep(5000);
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

Run Vector Search Queries

This section demonstrates how to run queries on your vectorized data.

Perform a semantic search.

This code performs a semantic search query for the phrase "Where do penguins live?" and returns the three most relevant results. It also prints a score that captures how well each result matches the query.

Add the following code to your Main.java file:

String query = "Where do penguins live?";
Embedding queryEmbedding = embeddingModel.embed(query).content();
EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmbedding)
        .maxResults(3)
        .build();
System.out.println("Performing the query...");
EmbeddingSearchResult<TextSegment> searchResult = embeddingStore.search(searchRequest);
List<EmbeddingMatch<TextSegment>> matches = searchResult.matches();
for (EmbeddingMatch<TextSegment> embeddingMatch : matches) {
    System.out.println("Response: " + embeddingMatch.embedded().text());
    System.out.println("Author: " + embeddingMatch.embedded().metadata().getString("author"));
    System.out.println("Score: " + embeddingMatch.score());
}

Response: Penguins are flightless seabirds that live almost exclusively below the equator. Some island-dwellers can be found in warmer climates.
Author: C
Score: 0.829620897769928
Response: Patagonia is home to five penguin species - Magellanic, Humboldt, Gentoo, Southern Rockhopper and King.
Author: B
Score: 0.7459062337875366
Response: Emperor penguins are the tallest and heaviest of all penguin species, standing up to 4 feet.
Author: D
Score: 0.6908764839172363

(Optional) Perform a semantic search with metadata filtering.

To perform a search with metadata filtering, you can use classes from the dev.langchain4j.store.embedding.filter.comparison package. These classes allow you to create filters that compare metadata values to specified values to narrow the results returned by the search.

This example filters for documents in which the value of the author field is either "B" or "C". Then, it performs a semantic search query for the phrase "Where do penguins live?".

Replace the code that instantiates an EmbeddingSearchRequest instance in the preceding step with the following code:

EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
        .queryEmbedding(queryEmbedding)
        .filter(new IsIn("author", List.of("B", "C")))
        .maxResults(3)
        .build();

Response: Penguins are flightless seabirds that live almost exclusively below the equator. Some island-dwellers can be found in warmer climates.
Author: C
Score: 0.8520907163619995
Response: Patagonia is home to five penguin species - Magellanic, Humboldt, Gentoo, Southern Rockhopper and King.
Author: B
Score: 0.7666836977005005

To learn more about metadata pre-filtering, see Atlas Vector Search Pre-Filter.

Use Your Data to Answer Questions

This section demonstrates a RAG implementation that uses the LangChain4j framework and Atlas Vector Search. Now that you've used Atlas Vector Search to retrieve semantically similar documents, use the following code examples to prompt the LLM to answer questions by using information from documents stored in Atlas.

Set up your project for RAG.

Add the following dependencies to the dependencies array in your project's pom.xml file, but do not remove any of the dependencies you already added. These dependencies add the LangChain4j AI services and OpenAI API for LangChain4j libraries to your application:

pom.xml

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-open-ai</artifactId>
    <version>1.0.0-beta1</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j</artifactId>
    <version>1.0.0-beta1</version>
</dependency>

After you finish editing the pom.xml file, reload your project to make sure your dependencies are installed.

Add the following imports to your imports list in your Main.java file:

import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.ObjectMapper;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;

Set the OPENAI_KEY environment variable to your OpenAI API key. You use this key to create a chat model that generates a response to your query.

Load the sample data.

In this step, you ingest data from an external source into Atlas. Download the rainforest-docs.json sample data file from the docs-code-examples GitHub repository. The documents in this file contain information about plants, animals, and weather in the rainforest.

Upload this file to the resources directory in your project, which is at the same level as the java directory that contains your application files.

You must process the data into a usable format that you can create embeddings from and persist to Atlas. This code defines the loadJsonDocuments() method that performs the following actions:

Retrieves the sample data from your resources directory by using the ClassLoader class
Parses the JSON documents to a List of MongoDB Document instances by using the ObjectMapper class

Add the following code to your Main.java file outside of your main method:

private static List<Document> loadJsonDocuments(String resourcePath) throws IOException {
    // Loads file from resources directory using the ClassLoader
    InputStream inputStream = Main.class.getClassLoader().getResourceAsStream(resourcePath);
    if (inputStream == null) {
        throw new FileNotFoundException("Resource not found: " + resourcePath);
    }
    // Parses JSON file to List of MongoDB Documents
    ObjectMapper objectMapper = new ObjectMapper();
    List<Document> documents = objectMapper.readValue(inputStream, new TypeReference<>() {});
    return documents;
}

Then, add the following code in the main method body to call the loadJsonDocuments() method and load your documents:

System.out.println("Loading documents from file...");
String resourcePath = "rainforest-docs.json";
List<Document> documents = loadJsonDocuments(resourcePath);

Store vector embeddings in Atlas.

In this step, you create vector embeddings from your sample documents and persist them to Atlas.

This code converts the content of the text fields in the sample documents to embeddings and persists the data to Atlas. The code includes a delay to accommodate the time needed for the vector conversion.

Add the following code into your Main.java file:

System.out.println("Persisting document embeddings...");
for (Document doc : documents) {
    TextSegment segment = TextSegment.from(
            doc.getString("text"),
            new Metadata(doc.get("metadata", Map.class)));
    Embedding embedding = embeddingModel.embed(segment).content();
    embeddingStore.add(embedding, segment);
}
try {
    Thread.sleep(5000);
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

Instantiate the chat model and specify the content retriever.

In this step, you instantiate a chat model from OpenAI so you can answer questions based on your data. You also specify a content retriever that surfaces relevant documents to inform the response crafted by the chat model.

This code performs the following actions:

Instantiates the chat model by using your OpenAI API key
Creates the content retriever with the following specifications:
- Retrieves at most 3 relevant documents
- Retrieves documents that have a relevance score of at least 0.75

Add the following code to your Main.java file in the main method body:

String chatApiKey = System.getenv("OPENAI_KEY");
ChatLanguageModel chatModel = OpenAiChatModel.builder()
        .apiKey(chatApiKey)
        .modelName("gpt-4")
        .build();
ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
        .embeddingStore(embeddingStore)
        .embeddingModel(embeddingModel)
        .maxResults(3)
        .minScore(0.75)
        .build();

Tip

Metadata Filtering

You can implement metadata filtering in your ContentRetriever by using the filter() builder method and passing an instance of a Filter. See the metadata filtering example in the preceding step to learn how to construct a Filter.

Create the chat assistant.

Create a simple Assistant interface that implements the AI Services API in your application. Create an interface file called Assistant.java at the same level as your Main.java file.

Define the Assistant interface:

package org.example;
public interface Assistant {
    String answer(String question);
}

In your Main.java file, instantiate the Assistant:

Assistant assistant = AiServices.builder(Assistant.class)
        .chatLanguageModel(chatModel)
        .contentRetriever(contentRetriever)
        .build();

Perform queries on your data.

Finally, perform a query on your sample data. Add the following code to your Main.java file to run a query and print the output:

String ragQuery = "What types of insects live in the rainforest?";
String output = assistant.answer(ragQuery);
System.out.println("Response:\n" + output);

Response:
In the rainforest, there are numerous species of insects
such as beetles, butterflies, moths, wasps, bees, flies, and
ants. Of the many insects that live in the rainforest, ants
are particularly important as they play a crucial role in
nutrient recycling and aeration of the soil. Moreover, many
of these insects are involved in the processes of
pollination and decomposition. The adaptations these insects
have developed enable their survival in the rainforest's
specific conditions, characterized by heavy rainfall.

Next Steps

MongoDB also provides the following developer resources:

How to Make a RAG Application With LangChain4j tutorial on the DEV Community website
MongoDB Developer GitHub Repository