Docs Menu
Docs Home
/
MongoDB Atlas
/ /

Get Started with the LangChain Integration

On this page

  • Background
  • Prerequisites
  • Set Up the Environment
  • Use Atlas as a Vector Store
  • Create the Atlas Vector Search Index
  • Run Vector Search Queries
  • Answer Questions on Your Data
  • Next Steps

Note

This tutorial uses LangChain's Python library. For a tutorial that uses the JavaScript library, see Get Started with the LangChain JS/TS Integration.

You can integrate Atlas Vector Search with LangChain to build LLM applications and implement retrieval-augmented generation (RAG). This tutorial demonstrates how to start using Atlas Vector Search with LangChain to perform semantic search on your data and build a RAG implementation. Specifically, you perform the following actions:

  1. Set up the environment.

  2. Store custom data on Atlas.

  3. Create an Atlas Vector Search index on your data.

  4. Run the following vector search queries:

    • Semantic search.

    • Semantic search with score.

    • Semantic search with metadata pre-filtering.

  5. Implement RAG by using Atlas Vector Search to answer questions on your data.

LangChain is an open-source framework that simplifies the creation of LLM applications through the use of "chains." Chains are LangChain-specific components that can be combined for a variety of AI use cases, including RAG.

By integrating Atlas Vector Search with LangChain, you can use Atlas as a vector database and use Atlas Vector Search to implement RAG by retrieving semantically similar documents from your data. To learn more about RAG, see Key Concepts.

To complete this tutorial, you must have the following:

  • An Atlas cluster running MongoDB version 6.0.11, 7.0.2, or later (including RCs).

  • An OpenAI API Key. You must have a paid OpenAI account with credits available for API requests.

  • A notebook to run your Python project such as Colab.

First, set up the environment for this tutorial by copying and pasting the following code snippets into your notebook.

1

Run the following command:

%pip install --upgrade --quiet langchain langchain-mongodb langchain-openai pymongo pypdf

Then, run the following code to import the required packages:

import getpass, os, pymongo, pprint
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pymongo import MongoClient
2

Run the following code and provide the following when prompted:

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
ATLAS_CONNECTION_STRING = getpass.getpass("MongoDB Atlas SRV Connection String:")

Note

Your connection string should use the following format:

mongodb+srv://<username>:<password>@<clusterName>.<hostname>.mongodb.net

Then, load custom data into Atlas and instantiate Atlas as a vector database, also called a vector store. Copy and paste the following code snippets into your notebook.

1

Run the following code to establish a connection to your Atlas cluster. It specifies the following:

  • langchain_db.test as the name of the collection for which to load the data.

  • vector_index as the name of the Atlas Vector Search index to use for querying the data.

# Connect to your Atlas cluster
client = MongoClient(ATLAS_CONNECTION_STRING)
# Define collection and index name
db_name = "langchain_db"
collection_name = "test"
atlas_collection = client[db_name][collection_name]
vector_search_index = "vector_index"
2

For this tutorial, you use a publicly accessible PDF document titled MongoDB Atlas Best Practices as the data source for your vector store. This document describes various recommendations and core concepts for managing your Atlas deployments.

To load the sample data, run the following code snippet. It does the following:

  • Retrieves the PDF from the specified URL and loads the raw text data.

  • Uses a text splitter to split the data into smaller documents.

  • Specifies chunk parameters, which determines the number of characters in each document and the number of characters that should overlap between two consecutive documents.

# Load the PDF
loader = PyPDFLoader("https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4HkJP")
data = loader.load()
# Split PDF into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
docs = text_splitter.split_documents(data)
# Print the first document
docs[0]
Document(page_content='Mong oDB Atlas Best P racticesJanuary 20 19A MongoD B White P aper', metadata={'source': 'https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4HkJP', 'page': 0})
3

Run the following code to create a vector store named vector_search from the sample documents. This snippet uses the MongoDBAtlasVectorSearch.from_documents method and specifies the following parameters:

  • The sample documents to store in the vector database.

  • OpenAI's embedding model as the model used to convert text into vector embeddings for the embedding field.

  • langchain_db.test as the Atlas collection to store the documents.

  • vector_index as the index to use for querying the vector store.

# Create the vector store
vector_search = MongoDBAtlasVectorSearch.from_documents(
documents = docs,
embedding = OpenAIEmbeddings(disallowed_special=()),
collection = atlas_collection,
index_name = vector_search_index
)

Tip

After running the sample code, you can view your vector embeddings in the Atlas UI by navigating to the langchain_db.test collection in your cluster.

To enable vector search queries on your vector store, create an Atlas Vector Search index on the langchain_db.test collection.

To create an Atlas Vector Search index, you must have Project Data Access Admin or higher access to the Atlas project.

1
  1. If it is not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.

  2. If it is not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. If the Clusters page is not already displayed, click Database in the sidebar.

2
  1. Click your cluster's name.

  2. Click the Atlas Search tab.

3
  1. Click Create Search Index.

  2. Under Atlas Vector Search, select JSON Editor and then click Next.

  3. In the Database and Collection section, find the langchain_db database, and select the test collection.

  4. In the Index Name field, enter vector_index.

  5. Replace the default definition with the following index definition and then click Next.

    This index definition specifies indexing the following fields in an index of the vectorSearch type:

    • embedding field as the vector type. The embedding field contains the embeddings created using OpenAI's text-embedding-ada-002 embedding model. The index definition specifies 1536 vector dimensions and measures similarity using cosine.

    • page field as the filter type for pre-filtering data by the page number in the PDF.

    {
    "fields":[
    {
    "type": "vector",
    "path": "embedding",
    "numDimensions": 1536,
    "similarity": "cosine"
    },
    {
    "type": "filter",
    "path": "page"
    }
    ]
    }
4

A modal window displays to let you know that your index is building.

5

The index should take about one minute to build. While it builds, the Status column reads Initial Sync. When it finishes building, the Status column reads Active.

Once Atlas builds your index, return to your notebook and run vector search queries on your data. The following examples demonstrate various queries that you can run on your vectorized data.

Tip

See also:

For a full list of semantic search methods, refer to the API reference.

This section demonstrates how to implement RAG in your application with Atlas Vector Search and LangChain. Now that you've used Atlas Vector Search to retrieve semantically similar documents, run the following code examples to prompt the LLM to answer questions based on those documents.

To learn about additional RAG use-cases with Atlas Vector Search, see the following templates provided by LangChain to help you build applications:

MongoDB also provides the following developer resources:

← Integrate Vector Search with AI Technologies