LangChain統合を始める

注意

このチュートリアルでは、Lgachein の Python ライブラリを使用します。JavaScriptライブラリを使用するチュートリアルについては、「 Lgachein JavaScript / Typescript統合を使い始める」を参照してください。

Atlas Vector Search を LangChain と統合して、LLM アプリケーションを構築し、検索拡張生成 (RAG) を実装できます。このチュートリアルでは、LangChain とともに Atlas Vector Search を使用して、データに対してセマンティック検索を実行し、RAG 実装を構築する方法を示します。具体的には、次のアクションを実行します。

環境を設定します。
カスタムデータを Atlas に保存します。
データに Atlas Vector Search インデックスを作成します。
次のベクトル検索クエリを実行します。
- セマンティック検索。
- スコア付きのセマンティック検索。
- メタデータの事前フィルタリングによるセマンティック検索。
Atlas Vector Search を使用してデータの質問に答え、 RAGを実装します。

このチュートリアルの実行可能なバージョンを Python エディタとして作業します。

バックグラウンド

LLMは、「チェーン」の使用を通じて LVM アプリケーションの作成を簡素化するオープンソースのフレームワークです。チェーンは、 RAGを含むさまざまな AI ユースケースで組み合わせることができる Lgachein 固有のコンポーネントです。

を RAGと統合することで、Atlas Vector Search Atlasをベクトルデータベースとして使用し、を使用してセマンティックで類似したドキュメントを検索して RG を実装することができます。Atlas Vector SearchRAGRGRAG Atlas Vector Searchの詳細については、「による検索拡張生成（）」をしてください。

前提条件

Atlas のサンプルデータセットからの映画データを含むコレクションを使用します。

Atlas アカウントで、MongoDB バージョン 6.0.11 または7.0.2 以降（RCs を含む）のクラスターを実行している。IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認してください。詳細については、クラスターの作成を参照してください。
OpenAI API キー。API リクエストに使用できるクレジットを持つ OpenAI アカウントが必要です。OpenAI アカウントの登録について詳しく知りたい場合は、 OpenAI API ウェブサイトをご覧ください。
Comb などのインタラクティブ Python ノートを実行するための環境。

環境を設定する

このチュートリアルの環境を設定します。 .ipynb 拡張子を持つファイルを保存して、インタラクティブPythonノートを作成します。このノートはPythonコードスニペットを個別に実行でき、このチュートリアルのコードを実行するために使用します。

ノートク環境を設定するには、次の手順に従います。

依存関係をインストールしてインポートします。

次のコマンドを実行します:

pip install --quiet --upgrade langchain langchain-community langchain-core langchain-mongodb langchain-openai pymongo pypdf

次に、次のコードを実行して必要なパッケージをインポートします。

import os, pymongo, pprint
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pymongo import MongoClient
from pymongo.operations import SearchIndexModel

環境変数を定義してください。

次のコードを実行し、プレースホルダーを次の値に置き換えます。

OpenAI API キー。
Atlas クラスターのSRV接続文字列。

os.environ["OPENAI_API_KEY"] = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"

注意

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

Atlas をベクトルストアとして使用

次に、カスタムデータを Atlas にロードし、Atlas をベクトルデータベースとしてインスタンス化します。これはベクトルストアとも呼ばれますが、。次のコードスニペットをコピーして、ノートに貼り付けます。

サンプルデータをロードします。

このチュートリアルでは、ベクトルストアのデータソースとして、最新のMongoDB収益レポートに関する一般にアクセス可能な PDFドキュメントを使用します。

サンプルデータをロードするには、次のコードスニペットを実行します。この処理では、次の処理が行われます。

指定された URL から PDF を検索し、未加工のテキストデータを読み込みます。
テキストスプリットを使用するデータを小さなドキュメントに分割します。
各ドキュメントの文字数と連続する 2 つのドキュメント間で重複する文字数を決定するチャンクパラメータを指定します。

# Load the PDF
loader = PyPDFLoader("https://investors.mongodb.com/node/13176/pdf")
data = loader.load()
# Split PDF into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
docs = text_splitter.split_documents(data)
# Print the first document
docs[0]

Document(metadata={'producer': 'West Corporation using ABCpdf', 'creator': 'PyPDF', 'creationdate': '2025-03-05T21:06:26+00:00', 'title': 'MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results', 'source': 'https://investors.mongodb.com/node/13176/pdf', 'total_pages': 9, 'page': 0, 'page_label': '1'}, page_content='MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results\nMarch 5, 2025\nFourth Quarter Fiscal 2025 Total Revenue of $548.4 million, up 20% Year-over-Year')

ベクトルストアをインスタンス化します。

次のコードを実行して、サンプルドキュメントから vector_store という名前のベクトルストアインスタンスを作成します。このスニペットによって次の内容が指定されます。

Atlas クラスターへの接続文字列。
langchain_db.test : ドキュメントを保存するための Atlas 名前空間として指定。
OpenAI の text-embedding-3-large 埋め込みモデルを使用して、テキストを embedding フィールドのベクトル埋め込みに変換します。
vector_index ベクトルストアをクエリするために使用するインデックスとして。

# Instantiate the vector store using your MongoDB connection string
vector_store = MongoDBAtlasVectorSearch.from_connection_string(
  connection_string = ATLAS_CONNECTION_STRING,
  namespace = "langchain_db.test",
  embedding =  OpenAIEmbeddings(model="text-embedding-3-large"),
  index_name = "vector_index"
)
# Add documents to the vector store
vector_store.add_documents(documents=docs)

サンプルコードを実行した後、クラスター内のlangchain_db.testコレクションに移動すると、Atlas UI でベクトル埋め込みを表示できます。

Tip

MongoDBAtlasVectorSearch APIリファレンス

Atlas Vector Search インデックスの作成

注意

Atlas Vector Search インデックスを作成するには、Atlas プロジェクトに対するProject Data Access Admin以上のアクセス権が必要です。

ベクトルストアでベクトル検索クエリを有効にするには、 LgDBヘルパーメソッドまたはPyMongoドライバーメソッドを使用して、langchain_db.testコレクションに Atlas ベクトル検索インデックスを作成します。

ノートブックで次のコードを、お好みの方法で実行します。インデックス定義では、次のフィールドのインデックス作成を指定します。

embedding ベクトル型としてのフィールド。 embeddingフィールドには、OpenAI のtext-embedding-3-large埋め込みモデルを使用して作成された埋め込みが含まれます。インデックス定義では、 3072ベクトル次元を指定し、 cosineを使用して類似性を測定します。
page_label PDF 内のページ番号でデータを事前にフィルタリングするためのフィルタータイプとしてのフィールド。

# Use helper method to create the vector search index
vector_store.create_vector_search_index(
   dimensions = 1536, # The dimensions of the vector embeddings to be indexed
   filters = [ "page_label" ]
)

Tip

create_vector_search_index API参照

# Create your index model, then create the search index
search_index_model = SearchIndexModel(
   definition={
      "fields": [
         {
         "type": "vector",
         "path": "embedding",
         "numDimensions": 1536,
         "similarity": "cosine"
         },
         {
         "type": "filter",
         "path": "page_label"
         }
      ]
   },
   name="vector_index",
   type="vectorSearch"
)
atlas_collection.create_search_index(model=search_index_model)

インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

ベクトル検索クエリの実行

Atlas がインデックスをビルドしたら、データに対してベクトル検索クエリを実行します。次の例は、ベクトル化されたデータに対して実行できるさまざまなクエリを示しています。

次のクエリは、 similarity_searchメソッドを使用して、string MongoDB acquisitionの基本的なセマンティック検索を実行します。関連性順にランク付けされたドキュメントのリストが返されます。

query = "MongoDB acquisition"
results = vector_store.similarity_search(query)
pprint.pprint(results)

[Document(id='67f0259b8bb2babc06924409', metadata={ ... }, page_content='SOURCE MongoDB, Inc.'),
 Document(id='67f0259b8bb2babc0692432f', metadata={ ... }, page_content='MongoDB  platform. In fiscal year 2026 we expect to see stable consumption growth in Atlas, our main growth driver," said Dev Ittycheria, President\nand Chief Executive Officer of MongoDB .'),
 Document(id='67f0259b8bb2babc06924355', metadata={ ... }, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
 Document(id='67f0259b8bb2babc069243a6', metadata={ ... }, page_content="MongoDB's unified, intelligent data platform was built to power the next generation of applications, and MongoDB  is the most widely available, globally")]

次のクエリは、 similarity_search_with_scoreメソッドを使用して string MongoDB acquisitionのセマンティック検索を実行し、返されるドキュメント数を3に制限するためにkパラメーターを指定します。

注意

この例のkパラメータは、同じ名前のknnBeta演算子オプションではなく、 similarity_search_with_scoreメソッドオプションを参照します。

最も関連性の高い 3 つのドキュメントと、 0と1の間の関連性スコアが返されます。

query = "MongoDB acquisition"
results = vector_store.similarity_search_with_score(
   query = query, k = 3
)
pprint.pprint(results)

[(Document(id='67f0259b8bb2babc06924409', metadata={ ... }, page_content='SOURCE MongoDB, Inc.'),
  0.8193451166152954),
 (Document(id='67f0259b8bb2babc0692432f', metadata={ ... }, page_content='MongoDB  platform. In fiscal year 2026 we expect to see stable consumption growth in Atlas, our main growth driver," said Dev Ittycheria, President\nand Chief Executive Officer of MongoDB .'),
  0.7815237045288086),
 (Document(id='67f0259b8bb2babc06924355', metadata={ ... }, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
  0.7788857221603394)]

コレクション内の別の値とインデックス付きフィールドを比較する MQL 一致式を使用して、データを事前にフィルタリングできます。フィルタリングするメタデータフィールドはすべて、filter タイプとしてインデックスを作成する必要があります。詳細については、「ベクトル検索のフィールドにインデックスを付ける方法」を参照してください。

注意

このチュートリアルのインデックスを作成したときに、 page_labelフィールドをフィルターとして指定しました。

次のクエリは、 similarity_search_with_scoreメソッドを使用して string MongoDB acquisitionのセマンティック検索を実行します。また、次の項目も指定します。

返されるドキュメント数を3に制限するkパラメーター。
$eq演算子を使用して2ページにのみ表示されるドキュメントを照合するpage_labelフィールドの事前フィルタリング。

ページ2から最も関連性の高いドキュメント 3 つと、 0と1の間の関連性スコアが返されます。

query = "MongoDB acquisition"
results = vector_store.similarity_search_with_score(
   query = query,
   k = 3,
   pre_filter = { "page_label": { "$eq": 2 } }
)
pprint.pprint(results)

[(Document(id='67f0259b8bb2babc06924355', metadata={ ... 'page_label': '2'}, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
  0.7788857221603394),
 (Document(id='67f0259b8bb2babc06924351', metadata={ ... 'page_label': '2'}, page_content='Measures."\nFourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB  acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation'),
  0.7606035470962524),
 (Document(id='67f0259b8bb2babc06924354', metadata={ ... 'page_label': '2'}, page_content='data.\nMongoDB  completed the redemption of 2026 Convertible Notes, eliminating all debt from the balance sheet. Additionally, in'),
  0.7583936452865601)]

Tip

セマンティック検索メソッドの完全なリストについては、 API リファレンスを参照してください。

データに関する質問に答えます

このセクションでは、RAG Atlas Vector Searchとを使用してアプリケーションに RG を実装する方法を説明します。Atlas Vector Searchを使用してセマンティックに類似したドキュメントを検索したので、次のコード例を実行して、それらのドキュメントに基づいて質問に答えるようにLLMに指示します。

この例では、次の処理を行います。

Atlas Vector Search をレプリカとしてインスタンス化k 10は、類似したドキュメントをクエリします。これには、最も関連性の高いドキュメントのみを検索するためのオプションのパラメーターも含まれています。

LgChuin プロンプトのテンプレートを定義するLLM は、これらのドキュメントをクエリのコンテキストとして使用するように指示します。LgChart はこれらのドキュメントを{context}入力変数に渡し、クエリを{question}変数に渡します。
連鎖を構築しますは、以下を指定します。
- コンテキストとして使用するドキュメントを検索するためのリージョンとしての Atlas ベクトル検索。
- 定義したプロンプトテンプレート。
- コンテキストに応じた応答を生成する OpenAI の gpt-4o チャットモデル。
サンプルクエリを使用してチェーンを呼び出します。
LLMの応答とコンテキストとして使用されたドキュメントを返します。生成される応答は異なる場合があります。

# Instantiate Atlas Vector Search as a retriever
retriever = vector_store.as_retriever(
   search_type = "similarity",
   search_kwargs = { "k": 10 }
)
# Define a prompt template
template = """
   Use the following pieces of context to answer the question at the end.
   {context}
   Question: {question}
"""
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI(model="gpt-4o")
# Construct a chain to answer questions on your data
chain = (
   { "context": retriever, "question": RunnablePassthrough()}
   | prompt
   | model
   | StrOutputParser()
)
# Prompt the chain
question = "What was MongoDB's latest acquisition?"
answer = chain.invoke(question)
print("Question: " + question)
print("Answer: " + answer)
# Return source documents
documents = retriever.invoke(question)
print("\nSource documents:")
pprint.pprint(documents)

Question: What was MongoDB's latest acquisition?
Answer: MongoDB's latest acquisition was Voyage AI, a pioneer in state-of-the-art embedding and reranking models.
Source documents:
[Document(id='67f0259b8bb2babc06924409', metadata={'_id': '67f0259b8bb2babc06924409', ... 'page_label': '9'}, page_content='SOURCE MongoDB, Inc.'),
 Document(id='67f0259b8bb2babc06924351', metadata={'_id': '67f0259b8bb2babc06924351', ... 'page_label': '2'}, page_content='Measures."\nFourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB  acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation'),
 Document(id='67f0259b8bb2babc0692432f', metadata={'_id': '67f0259b8bb2babc0692432f', ... 'page_label': '1'}, page_content='MongoDB  platform. In fiscal year 2026 we expect to see stable consumption growth in Atlas, our main growth driver," said Dev Ittycheria, President\nand Chief Executive Officer of MongoDB .'),
 Document(id='67f0259b8bb2babc06924355', metadata={'_id': '67f0259b8bb2babc06924355', ... 'page_label': '2'}, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
 Document(id='67f0259b8bb2babc069243a6', metadata={'_id': '67f0259b8bb2babc069243a6', ... 'page_label': '4'}, page_content="MongoDB's unified, intelligent data platform was built to power the next generation of applications, and MongoDB  is the most widely available, globally"),
 Document(id='67f0259b8bb2babc06924329', metadata={'_id': '67f0259b8bb2babc06924329', ... 'page_label': '1'}, page_content='MongoDB, Inc. Announces Fourth Quarter and Full Year Fiscal 2025 Financial Results\nMarch 5, 2025\nFourth Quarter Fiscal 2025 Total Revenue of $548.4 million, up 20% Year-over-Year'),
 Document(id='67f0259b8bb2babc069243a7', metadata={'_id': '67f0259b8bb2babc069243a7', ... 'page_label': '4'}, page_content='distributed database on the market. With integrated capabilities for operational data, search, real-time analytics, and AI-powered retrieval, MongoDB'),
 Document(id='67f0259b8bb2babc069243a5', metadata={'_id': '67f0259b8bb2babc069243a5', ... 'page_label': '4'}, page_content="Headquartered in New York, MongoDB's mission is to empower innovators to create, transform, and disrupt industries with software and data."),
 Document(id='67f0259b8bb2babc06924354', metadata={'_id': '67f0259b8bb2babc06924354', ... 'page_label': '2'}, page_content='data.\nMongoDB  completed the redemption of 2026 Convertible Notes, eliminating all debt from the balance sheet. Additionally, in'),
 Document(id='67f0259b8bb2babc069243a9', metadata={'_id': '67f0259b8bb2babc069243a9', ... 'page_label': '4'}, page_content='50,000 customers across almost every industry—including 70% of the Fortune 100—rely on MongoDB  for their most important applications. To learn\nmore, visit mongodb.com .\nInvestor Relations')]

この例では、次の処理を行います。

Atlas Vector Search をレプリカとしてインスタンス化次の任意のパラメーターを含む類似したドキュメントをクエリします。
- k 最も関連性の高い10のみを検索するには
- score_threshold 関連性スコアが0.75を超えるドキュメントのみを使用するようにします。
  注意
  このパラメーターは、Atlas Search クエリで使用される関連性スコアではなく、Lgache が結果を正規化するために使用する関連性スコアを示します。 RAG実装で Atlas Search スコアを使用するには、 similarity_search_with_scoreメソッドを使用し、Atlas Search スコアでフィルタリングするカスタムリージョンを定義します。
- pre_filter ページ2にのみ表示されるドキュメントをpage_labelフィールドでフィルタリングします。

LgChuin プロンプトのテンプレートを定義するLLM は、これらのドキュメントをクエリのコンテキストとして使用するように指示します。LgChart はこれらのドキュメントを{context}入力変数に渡し、クエリを{question}変数に渡します。
連鎖を構築しますは、以下を指定します。
- コンテキストとして使用するドキュメントを検索するためのリージョンとしての Atlas ベクトル検索。
- 定義したプロンプトテンプレート。
- コンテキストに応じた応答を生成する OpenAI の gpt-4o チャットモデル。
サンプルクエリを使用してチェーンを呼び出します。
LLMの応答とコンテキストとして使用されたドキュメントを返します。生成される応答は異なる場合があります。

# Instantiate Atlas Vector Search as a retriever
retriever = vector_store.as_retriever(
   search_type = "similarity",
   search_kwargs = {
      "k": 10,
      "score_threshold": 0.75,
      "pre_filter": { "page_label": { "$eq": 2 } }
   }
)
# Define a prompt template
template = """
   Use the following pieces of context to answer the question at the end.
   {context}
   Question: {question}
"""
prompt = PromptTemplate.from_template(template)
model = ChatOpenAI(model="gpt-4o")
# Construct a chain to answer questions on your data
chain = (
   { "context": retriever, "question": RunnablePassthrough()}
   | prompt
   | model
   | StrOutputParser()
)
# Prompt the chain
question = "What was MongoDB's latest acquisition?"
answer = rag_chain.invoke(question)
print("Question: " + question)
print("Answer: " + answer)
# Return source documents
documents = retriever.invoke(question)
print("\nSource documents:")
pprint.pprint(documents)

Question: What was MongoDB's latest acquisition?
Answer: MongoDB's latest acquisition was Voyage AI, a pioneer in state-of-the-art embedding and reranking models.
Source documents:
[Document(id='67f0259b8bb2babc06924351', metadata={'_id': '67f0259b8bb2babc06924351', ... 'page_label': '2'}, page_content='Measures."\nFourth Quarter Fiscal 2025 and Recent Business Highlights\nMongoDB  acquired Voyage AI, a pioneer in state-of-the-art embedding and reranking models that power next-generation'),
 Document(id='67f0259b8bb2babc06924355', metadata={'_id': '67f0259b8bb2babc06924355', ... 'page_label': '2'}, page_content='conjunction with the acquisition of Voyage, MongoDB  is announcing a stock buyback program of $200 million, to offset the\ndilutive impact of the acquisition consideration.'),
 Document(id='67f0259b8bb2babc06924354', metadata={'_id': '67f0259b8bb2babc06924354', ... 'page_label': '2'}, page_content='data.\nMongoDB  completed the redemption of 2026 Convertible Notes, eliminating all debt from the balance sheet. Additionally, in'),
 Document(id='67f0259b8bb2babc06924358', metadata={'_id': '67f0259b8bb2babc06924358', ... 'page_label': '2'}, page_content='Lombard Odier, a Swiss private bank, partnered with MongoDB  to migrate and modernize its legacy banking technology'),
 Document(id='67f0259b8bb2babc06924352', metadata={'_id': '67f0259b8bb2babc06924352', ... 'page_label': '2'}, page_content="AI applications. Integrating Voyage AI's technology with MongoDB  will enable organizations to easily build trustworthy,"),
 Document(id='67f0259b8bb2babc0692435a', metadata={'_id': '67f0259b8bb2babc0692435a', ... 'page_label': '2'}, page_content='applications from a legacy relational database to MongoDB  20 times faster than previous migrations.\nFirst Quarter and Full Year Fiscal 2026 Guidance'),
 Document(id='67f0259b8bb2babc06924356', metadata={'_id': '67f0259b8bb2babc06924356', ... 'page_label': '2'}, page_content='For the third consecutive year, MongoDB  was named a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud'),
 Document(id='67f0259b8bb2babc0692434d', metadata={'_id': '67f0259b8bb2babc0692434d', ... 'page_label': '2'}, page_content='compared to $121.5 million of cash from operations in the year-ago period. MongoDB  used $29.6 million of cash in capital'),
 Document(id='67f0259b8bb2babc0692434c', metadata={'_id': '67f0259b8bb2babc0692434c', ... 'page_label': '2'}, page_content='Cash Flow: During the year ended January 31, 2025, MongoDB  generated $150.2 million of cash from operations,'),
 Document(id='67f0259b8bb2babc06924364', metadata={'_id': '67f0259b8bb2babc06924364', ... 'page_label': '2'}, page_content='MongoDB  will host a conference call today, March 5, 2025, at 5:00 p.m. (Eastern Time) to discuss its financial results and business outlook. A live')]

ビデオで学ぶ

このビデオチュートリアルで、LangChain と MongoDB を使用したセマンティック検索および RAG について詳しく学べます。

所要時間: 8分

戻る

LgChuin

メモリとセマンティックキャッシュ