将 Atlas Vector Search 与 LangChain 集成

在此页面上

安装和设置

向量存储
Retrievers
LLM 缓存
文档加载器
聊天记录
二进制存储
其他资源

您可以将 Atlas Vector Search 与 LangChain 集成来构建生成式人工智能和 RAG 应用程序。本页概述了 MongoDB LangChain Python 集成以及您可以在应用程序中使用的不同组件。

开始体验

注意

For a full list of components and methods, see API reference.

有关 JavaScript 集成，请参阅 LangChain JS/TS 集成入门。

安装和设置

要将 Atlas Vector Search 与 LangChain 一起使用，您必须首先安装 langchain-mongodb 包：

pip install langchain-mongodb

某些组件还需要以下 LangChain 基础包：

pip install langchain langchain_community

向量存储

MongoDBAtlasVectorSearch 是一个向量存储，允许您在 Atlas 中存储和检索集合中的向量嵌入。您可以使用此组件存储数据中的嵌入，并使用 Atlas Vector Search 进行检索。

此组件需要一个 Atlas Vector Search 索引。

使用

from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from pymongo import MongoClient
# Use some embedding model to generate embeddings
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["<database-name>"]["<collection-name>"]
# Instantiate the vector store
vector_store = MongoDBAtlasVectorSearch(
   collection = collection         # Collection to store embeddings
   embedding = FakeEmbeddings(),   # Embedding model to use
   index_name = "vector_index",    # Name of the vector search index
   relevance_score_fn = "cosine"   # Similarity score function, can also be "euclidean" or "dotProduct"
)

注意

Retrievers

LangChain 检索器是用于从向量存储中获取相关文档的组件。您可以使用 LangChain 的内置检索器或以下 MongoDB 检索器从 Atlas 查询和检索数据。

全文检索器

MongoDBAtlasFullTextSearchRetriever 是使用 Atlas Search 进行全文搜索的检索器。具体来说，它使用 Lucene 的标准 BM25 算法。

此检索器需要 Atlas Search 索引。

使用

from langchain_mongodb.retrievers.full_text_search import MongoDBAtlasFullTextSearchRetriever
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["<database-name>"]["<collection-name>"]
# Initialize the retriever
retriever = MongoDBAtlasFullTextSearchRetriever(
   collection = collection,           # MongoDB Collection in Atlas
   search_field = "<field-name>",     # Name of the field to search
   search_index_name = "<index-name>" # Name of the search index
)
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print(doc)

注意

API 参考

混合搜索检索器

MongoDBAtlasHybridSearchRetriever 是使用倒数排名融合 (RRF) 算法将向量搜索和全文搜索结果相结合的检索器。如要了解更多信息，请参阅如何执行混合搜索。

该检索器需要现有的向量存储、Atlas Vector Search 索引和 Atlas Search 索引。

使用

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
   vectorstore = <vector-store>,        # Vector store instance
   search_index_name = "<index-name>",  # Name of the Atlas Search index
   top_k = 5,                           # Number of documents to return
   fulltext_penalty = 60.0,             # Penalty for full-text search
   vector_penalty = 60.0                # Penalty for vector search
)
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print(doc)

注意

LLM 缓存

缓存用于通过存储相似或重复查询的重复响应来优化 LLM 性能，以避免重新计算它们。MongoDB 为 LangChain 应用程序提供以下缓存。

MongoDB 缓存

MongoDBCache 允许您在 Atlas 中存储基本缓存。

使用

from langchain_mongodb import MongoDBCache
from langchain_core.globals import set_llm_cache
set_llm_cache(MongoDBCache(
   connection_string = "<connection-string>", # Atlas connection string
   database_name = "<database-name>",         # Database to store the cache
   collection_name = "<collection-name>"      # Collection to store the cache
))

注意

语义缓存

语义缓存是一种更高级的缓存形式，它根据用户输入和缓存结果之间的语义相似性检索缓存的提示。

MongoDBAtlasSemanticCache 是一个语义缓存，它使用 Atlas Vector Search 来检索缓存的提示。该组件需要 Atlas Vector Search 索引。

使用

from langchain_mongodb import MongoDBAtlasSemanticCache
from langchain_core.globals import set_llm_cache
# Use some embedding model to generate embeddings
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
set_llm_cache(MongoDBAtlasSemanticCache(
   embedding = FakeEmbeddings(),              # Embedding model to use
   connection_string = "<connection-string>", # Atlas connection string
   database_name = "<database-name>",         # Database to store the cache
   collection_name = "<collection-name>"      # Collection to store the cache
))

注意

文档加载器

文档加载器是帮助您为 LangChain 应用程序加载数据的工具。

MongodbLoader 是一个文档加载器，可从 MongoDB 数据库返回文档列表。

使用

from langchain_community.document_loaders.mongodb import MongodbLoader
loader = MongodbLoader(
   connection_string = "<connection-string>",  # Atlas cluster or local MongoDB instance URI
   db_name = "<database-name>",                # Database that contains the collection
   collection_name = "<collection-name>",      # Collection to load documents from
   filter_criteria = { <filter-document> },    # Optional document to specify a filter
   field_names = ["<field-name>", ... ]        # List of fields to return
)
docs = loader.load()

注意

聊天记录

MongoDBChatMessageHistory 是一个允许您在 MongoDB 数据库中存储和管理聊天消息历史记录的组件。它可以保存与唯一会话标识符关联的用户和 AI 生成的消息。这对于需要跟踪一段时间内交互的应用程序（例如聊天机器人）非常有用。

使用

from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
chat_message_history = MongoDBChatMessageHistory(
   session_id = "<session-id>",               # Unique session identifier
   connection_string = "<connection-string>", # Atlas cluster or local MongoDB instance URI
   database_name = "<database-name>",         # Database to store the chat history
   collection_name = "<collection-name>"      # Collection to store the chat history
)
chat_message_history.add_user_message("Hello")
chat_message_history.add_ai_message("Hi")

chat_message_history.messages

[HumanMessage(content='Hello'), AIMessage(content='Hi')]

注意

二进制存储

MongoDBByteStore 是一个自定义数据存储，它使用MongoDB存储和管理二进制数据，特别是以字节表示的数据。您可以使用键值对执行CRUD操作，其中键是字符串，值是字节序列。

使用

from langchain.storage import MongoDBByteStore
# Instantiate the MongoDBByteStore
mongodb_store = MongoDBByteStore(
   connection_string = "<connection-string>",  # Atlas cluster or local MongoDB instance URI
   db_name = "<database-name>",                # Name of the database
   collection_name = "<collection-name>"       # Name of the collection
)
# Set values for keys
mongodb_store.mset([("key1", b"hello"), ("key2", b"world")])
# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
print(values)  # Output: [b'hello', b'world']
# Iterate over keys
for key in mongodb_store.yield_keys():
   print(key)  # Output: key1, key2
# Delete keys
mongodb_store.mdelete(["key1", "key2"])

注意

API 参考

其他资源

MongoDB 还提供以下开发者资源：

后退

AI 集成

来年

开始体验