Atlas Vector Search と LangChain の統合

項目一覧

インストールとセットアップ

ベクトルストア
Retrievers
LLM キャッシュ
ドキュメントローダー
チャット履歴
バイナリストレージ
追加リソース

Atlas Vector Search を LangChain と統合して、生成系 AI と RAG アプリケーションを構築できます。このページでは、MongoDB LangChain Python 統合と、アプリケーションで使用できるさまざまなコンポーネントの概要を説明します。

はじめる

注意

For a full list of components and methods, see API 参照.

JavaScript 統合については、「 LangChain JS/TS 統合を始める」を参照してください。

インストールとセットアップ

LangChain で Atlas Vector Search を使用するには、まず langchain-mongodb パッケージをインストールする必要があります。

pip install langchain-mongodb

コンポーネントによっては、以下の LangChain 基本パッケージも必要です。

pip install langchain langchain_community

ベクトルストア

MongoDBAtlasVectorSearch は、Atlas のコレクションからベクトル埋め込みを保存および検索できるベクトル保存です。このコンポーネントを使用してデータの埋め込みを保存し、Atlas Vector Search を使用して埋め込みを検索できます。

このコンポーネントには Atlas Vector Search インデックスが必要です。

使用法

from langchain_mongodb.vectorstores import MongoDBAtlasVectorSearch
from pymongo import MongoClient
# Use some embedding model to generate embeddings
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["<database-name>"]["<collection-name>"]
# Instantiate the vector store
vector_store = MongoDBAtlasVectorSearch(
   collection = collection         # Collection to store embeddings
   embedding = FakeEmbeddings(),   # Embedding model to use
   index_name = "vector_index",    # Name of the vector search index
   relevance_score_fn = "cosine"   # Similarity score function, can also be "euclidean" or "dotProduct"
)

注意

Retrievers

LangChain レトリーバーはベクトルストアから関連するドキュメントを取得するために使用するコンポーネントです。LangChain の組み込み検索ツールまたは次の MongoDB 検索システムを使用して、Atlas からデータをクエリして検索できます。

全文検索システム

MongoDBAtlasFullTextSearchRetriever は、Atlas Search を使用して全文検索を実行する検索システムです。具体的には、Lucene の標準 BM25 アルゴリズムを使用します。

この検索システムには Atlas Search インデックスが必要です。

使用法

from langchain_mongodb.retrievers.full_text_search import MongoDBAtlasFullTextSearchRetriever
# Connect to your Atlas cluster
client = MongoClient("<connection-string>")
collection = client["<database-name>"]["<collection-name>"]
# Initialize the retriever
retriever = MongoDBAtlasFullTextSearchRetriever(
   collection = collection,           # MongoDB Collection in Atlas
   search_field = "<field-name>",     # Name of the field to search
   search_index_name = "<index-name>" # Name of the search index
)
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print(doc)

注意

API リファレンス

ハイブリッド検索システム

MongoDBAtlasHybridSearchRetriever は、相互ランク融合 (RRF) アルゴリズムを使用してベクトル検索と全文検索の結果を組み合わせた検索システムです。詳しくは、「ハイブリッド検索の実行方法」を参照してください。

この検索システムには、既存のベクトルストア、Atlas Vector Search インデックス、および Atlas Search インデックスが必要です。

使用法

from langchain_mongodb.retrievers.hybrid_search import MongoDBAtlasHybridSearchRetriever
# Initialize the retriever
retriever = MongoDBAtlasHybridSearchRetriever(
   vectorstore = <vector-store>,        # Vector store instance
   search_index_name = "<index-name>",  # Name of the Atlas Search index
   top_k = 5,                           # Number of documents to return
   fulltext_penalty = 60.0,             # Penalty for full-text search
   vector_penalty = 60.0                # Penalty for vector search
)
# Define your query
query = "some search query"
# Print results
documents = retriever.invoke(query)
for doc in documents:
   print(doc)

注意

LLM キャッシュ

キャッシュは、類似または反復的なクエリに対する反復的な応答を保存して再計算を回避することにより、LLM パフォーマンスを最適化するために使用されます。MongoDB は、LangChain アプリケーションに対して次のキャッシュを提供します。

MongoDB キャッシュ

MongoDBCache を使用すると、Atlas に基本的なキャッシュを保存できます。

使用法

from langchain_mongodb import MongoDBCache
from langchain_core.globals import set_llm_cache
set_llm_cache(MongoDBCache(
   connection_string = "<connection-string>", # Atlas connection string
   database_name = "<database-name>",         # Database to store the cache
   collection_name = "<collection-name>"      # Collection to store the cache
))

注意

セマンティックキャッシュ

セマンティックキャッシュは、ユーザー入力とキャッシュされた結果のセマンティックな類似性に基づいて、キャッシュされたプロンプトを検索する、より高度なキャッシュ形式です。

MongoDBAtlasSemanticCache は、Atlas Vector Search を使用してキャッシュされたプロンプトを検索する、セマンティックキャッシュです。このコンポーネントには、Atlas Vector Search インデックスが必要です。

使用法

from langchain_mongodb import MongoDBAtlasSemanticCache
from langchain_core.globals import set_llm_cache
# Use some embedding model to generate embeddings
from tests.integration_tests.vectorstores.fake_embeddings import FakeEmbeddings
set_llm_cache(MongoDBAtlasSemanticCache(
   embedding = FakeEmbeddings(),              # Embedding model to use
   connection_string = "<connection-string>", # Atlas connection string
   database_name = "<database-name>",         # Database to store the cache
   collection_name = "<collection-name>"      # Collection to store the cache
))

注意

ドキュメントローダー

ドキュメントローダーは LangChain アプリケーションにデータをロードするのに役立つツールです。

MongodbLoader は、MongoDB データベースからドキュメントのリストを返すドキュメントローダーです。

使用法

from langchain_community.document_loaders.mongodb import MongodbLoader
loader = MongodbLoader(
   connection_string = "<connection-string>",  # Atlas cluster or local MongoDB instance URI
   db_name = "<database-name>",                # Database that contains the collection
   collection_name = "<collection-name>",      # Collection to load documents from
   filter_criteria = { <filter-document> },    # Optional document to specify a filter
   field_names = ["<field-name>", ... ]        # List of fields to return
)
docs = loader.load()

注意

チャット履歴

MongoDBChatMessageHistory は、MongoDB データベースにチャットメッセージ履歴を保存および管理できるコンポーネントです。ユーザーと AI が生成したメッセージの両方を、一意のセッション識別子に関連付けて保存することができます。これは、チャットボットなど、時間の経過に伴うインタラクションの追跡が必要なアプリケーションに役立ちます。

使用法

from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
chat_message_history = MongoDBChatMessageHistory(
   session_id = "<session-id>",               # Unique session identifier
   connection_string = "<connection-string>", # Atlas cluster or local MongoDB instance URI
   database_name = "<database-name>",         # Database to store the chat history
   collection_name = "<collection-name>"      # Collection to store the chat history
)
chat_message_history.add_user_message("Hello")
chat_message_history.add_ai_message("Hi")

chat_message_history.messages

[HumanMessage(content='Hello'), AIMessage(content='Hi')]

注意

バイナリストレージ

MongoDBByteStore は、MongoDB を使用してバイナリデータ、具体的にはバイトで表されるデータを保存および管理するカスタムデータストアです。キーが文字列で値がバイトシーケンスであるキーと値のペアを使用して、CRUD 操作を実行できます。

使用法

from langchain.storage import MongoDBByteStore
# Instantiate the MongoDBByteStore
mongodb_store = MongoDBByteStore(
   connection_string = "<connection-string>",  # Atlas cluster or local MongoDB instance URI
   db_name = "<database-name>",                # Name of the database
   collection_name = "<collection-name>"       # Name of the collection
)
# Set values for keys
mongodb_store.mset([("key1", b"hello"), ("key2", b"world")])
# Get values for keys
values = mongodb_store.mget(["key1", "key2"])
print(values)  # Output: [b'hello', b'world']
# Iterate over keys
for key in mongodb_store.yield_keys():
   print(key)  # Output: key1, key2
# Delete keys
mongodb_store.mdelete(["key1", "key2"])

注意

API リファレンス

追加リソース

MongoDBは、次の開発者リソースも提供しています。

戻る

AI 統合