ベクトル埋め込みの作成方法

項目一覧

はじめる
前提条件
埋め込みモデルの使用
データから埋め込みを作成
クエリの埋め込みの作成
Considerations
埋め込みの作成方法の選択
埋め込みモデルの選択
ベクトル圧縮
埋め込みの検証
次のステップ

Atlas では他のデータと一緒にベクトル埋め込みを保存できます。これらの埋め込みにより、データ内の意味のある関係がキャプチャされ、セマンティック検索を実行し、RAG Atlas Vector SearchAtlas ベクトル検索に RAG を実装できるようになります。

はじめる

次のチュートリアルを使用して、ベクトル埋め込みを作成し、 AtlasAtlas Vector Search ベクトル検索を使用してそれらをクエリする方法を学習します。具体的には、次のアクションを実行します。

埋め込みモデルを使用してベクトル埋め込みを生成する関数を定義します。
データから埋め込みを作成し、Atlas に保存します。
検索語句から埋め込みを作成し、ベクトル検索クエリを実行します。

実稼働アプリケーションでは通常、ベクトル埋め込みを生成するためのスクリプトを記述します。このページのサンプルコードから開始し、のユースケースに合わせてカスタマイズできます。

➤ [言語の選択] ドロップダウンメニューを使用して、このページの例の言語を設定します。

言語の選択

このチュートリアルの実行可能なバージョンを Python エディタとして作業します。

前提条件

Atlas のサンプルデータセットからの映画データを含むコレクションを使用します。

クラスターが MongoDB バージョン 6.0.11, 7.0.2 以降（RC を含む）を実行している Atlas アカウント.IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認します。詳細については、「クラスターの作成」を参照してください。
C# プロジェクトを実行するためのターミナルとコードエディター。
.NET 8.0 以上がインストールされている。
Hugging Face アクセストークンまたは OpenAI APIキー。

クラスターが MongoDB バージョン 6.0.11, 7.0.2 以降（RC を含む）を実行している Atlas アカウント.IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認します。詳細については、「クラスターの作成」を参照してください。
Go プロジェクトを実行するためのターミナルとコードエディター。
Go がインストールされました。
Hugging Face アクセストークンまたは OpenAI APIキー。

クラスターが MongoDB バージョン 6.0.11, 7.0.2 以降（RC を含む）を実行している Atlas アカウント.IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認します。詳細については、「クラスターの作成」を参照してください。

Java 開発キット（JDK）バージョン8 以降。
Javaアプリケーションを設定して実行する環境。 Maven または Gradle を構成してプロジェクトを構築および実行するようにするには、IntelliJ IDEA や Eclipse IDE などの統合開発環境（IDE）を使用することをお勧めします。

次のいずれか 1 つ。
- 読み取りアクセス権を持つことのレベルのアクセストークン
- OpenAI APIキー。APIリクエストに使用できるクレジットを持つ OpenAI アカウントが必要です。OpenAI アカウントの登録について詳しく知りたい場合は、 OpenAI API ウェブサイトをご覧ください。

クラスターが MongoDB バージョン 6.0.11, 7.0.2 以降（RC を含む）を実行している Atlas アカウント.IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認します。詳細については、「クラスターの作成」を参照してください。
Node.js プロジェクトを実行するためのターミナルとコードエディター。
npm と Node.js インストール済み。
OpenAI モデルを使用している場合は、OpenAI APIキーが必要です。

クラスターが MongoDB バージョン 6.0.11, 7.0.2 以降（RC を含む）を実行している Atlas アカウント.IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認します。詳細については、「クラスターの作成」を参照してください。
VS Code や Colab などの対話型 Python ノートブックを実行できる環境。
OpenAI モデルを使用している場合は、OpenAI APIキーが必要です。

埋め込みモデルの使用

.NET プロジェクトを初期化します。

ターミナルウィンドウで、以下のコマンドを実行してプロジェクトを初期化します。

dotnet new console -o MyCompany.Embeddings
cd MyCompany.Embeddings

依存関係をインストールしてインポートします。

ターミナルウィンドウで、次のコマンドを実行します。

dotnet add package MongoDB.Driver

環境変数を設定します。

環境変数をエクスポートし、PowerShell でset するか、IDE の環境変数マネージャーを使用して、接続文字列と HuggingFace アクセストークンをプロジェクトで使用できるようにします。

export HUGGINGFACE_ACCESS_TOKEN="<access-token>"
export ATLAS_CONNECTION_STRING="<connection-string>"

<access-token> プレースホルダー値を 1 つのドキュメントアクセストークンに置き換えます。

<connection-string> プレースホルダー値を、Atlas クラスター SRV 接続文字列に置き換えます。

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

ベクトル埋め込みを生成する関数を定義する。

AIService.cs という名前のファイルに同じ名前の新しいクラスを作成し、次のコードを貼り付けます。このコードは、指定された文字列入力の配列に対して埋め込みの配列を生成するために、GetEmbeddingsAsync という名前の非同期タスクを定義します。この関数は、mxbai-embed-large-v1 埋め込みモデルを使用します。

AIService.cs

namespace MyCompany.Embeddings;
using System;
using System.Net.Http;
using System.Text.Json;
using System.Threading.Tasks;
using System.Net.Http.Headers;
public class AIService
{
    private static readonly string? HuggingFaceAccessToken = Environment.GetEnvironmentVariable("HUGGINGFACE_ACCESS_TOKEN");
    private static readonly HttpClient Client = new HttpClient();
    public async Task<Dictionary<string, float[]>> GetEmbeddingsAsync(string[] texts)
    {
        const string modelName = "mixedbread-ai/mxbai-embed-large-v1";
        const string url = $"https://api-inference.huggingface.co/models/{modelName}";
        Client.DefaultRequestHeaders.Authorization 
            = new AuthenticationHeaderValue("Bearer", HuggingFaceAccessToken);
        var data = new { inputs = texts };
        var dataJson = JsonSerializer.Serialize(data);
        var content = new StringContent(dataJson,null, "application/json");
        var response = await Client.PostAsync(url, content);
        response.EnsureSuccessStatusCode();
        var responseString = await response.Content.ReadAsStringAsync();
        var embeddings = JsonSerializer.Deserialize<float[][]>(responseString);
        if (embeddings is null)
        {
            throw new ApplicationException("Failed to deserialize embeddings response to an array of floats.");
        }
        Dictionary<string, float[]> documentData = new Dictionary<string, float[]>();
        var embeddingCount = embeddings.Length;
        foreach (var value in Enumerable.Range(0, embeddingCount))
        {
            // Pair each embedding with the text used to generate it.
            documentData[texts[value]] = embeddings[value];
        }
        return documentData;
    }
}

注意

は額文字モデルを呼び出す場合の 503

Hugging Face モデルハブモデルを呼び出すときに、503 エラーが発生する場合があります。この問題を解決するには、少し待ってから再試行します。

.NET プロジェクトを初期化します。

ターミナルウィンドウで、以下のコマンドを実行してプロジェクトを初期化します。

dotnet new console -o MyCompany.Embeddings
cd MyCompany.Embeddings

依存関係をインストールしてインポートします。

ターミナルウィンドウで、次のコマンドを実行します。

dotnet add package MongoDB.Driver
dotnet add package OpenAI

環境変数を設定します。

export OPENAI_API_KEY="<api-key>"
export ATLAS_CONNECTION_STRING="<connection-string>"

<api-key> プレースホルダー値を OpenAI APIキーに置き換えます。

<connection-string> プレースホルダー値を、Atlas クラスター SRV 接続文字列に置き換えます。

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

ベクトル埋め込みを生成する関数を定義する。

AIService.cs という名前のファイルに同じ名前の新しいクラスを作成し、次のコードを貼り付けます。このコードは、指定された文字列入力の配列に対して埋め込みの配列を生成するために、GetEmbeddingsAsync という名前の非同期タスクを定義します。この関数は、OpenAI の text-embedding-3-small モデルを使用して、指定された入力の埋め込みを生成します。

AIService.cs

namespace MyCompany.Embeddings;
using OpenAI.Embeddings;
using System;
using System.Threading.Tasks;
public class AIService
{
    private static readonly string? OpenAIApiKey = Environment.GetEnvironmentVariable("OPENAI_API_KEY");
    private static readonly string EmbeddingModelName = "text-embedding-3-small";
    public async Task<Dictionary<string, float[]>> GetEmbeddingsAsync(string[] texts)
    {
        EmbeddingClient embeddingClient = new(model: EmbeddingModelName, apiKey: OpenAIApiKey);
        Dictionary<string, float[]> documentData = new Dictionary<string, float[]>();
        try
        {
            var result = await embeddingClient.GenerateEmbeddingsAsync(texts);
            var embeddingCount = result.Value.Count;
            foreach (var index in Enumerable.Range(0, embeddingCount))
            {
                // Pair each embedding with the text used to generate it.
                documentData[texts[index]] = result.Value[index].ToFloats().ToArray();
            }
        }
        catch (Exception e)
        {
            throw new ApplicationException(e.Message);
        }
        return documentData;
    }
}

このセクションでは、埋め込みモデルを用いてベクトル埋め込みを生成する関数を定義します。オープンソースの埋め込みモデルを使用するか、OpenAI などの独自モデルを使用するのかによって、タブを選択します。

注意

オープンソースの埋め込みモデルは無料で使用でき、アプリケーションからローカルに読み込むことができます。プロプライエタリモデルでは、モデルにアクセスするには API キーが必要です。

Go プロジェクトを初期化します。

ターミナルウィンドウで次のコマンドを実行して、 my-embeddings-projectという名前の新しいディレクトリを作成し、プロジェクトを初期化します。

mkdir my-embeddings-project
cd my-embeddings-project
go mod init my-embeddings-project

依存関係をインストールしてインポートします。

ターミナルウィンドウで、次のコマンドを実行します。

go get github.com/joho/godotenv
go get go.mongodb.org/mongo-driver/v2/mongo
go get github.com/tmc/langchaingo/llms

シークレットを管理するには、`.env`ファイルを作成します。

プロジェクトで、 Atlas 接続文字列と Hugeface アクセストークンを保存するための .env ファイルを作成します。

HUGGINGFACEHUB_API_TOKEN = "<access-token>"
ATLAS_CONNECTION_STRING = "<connection-string>"

<access-token> プレースホルダー値を 1 つのドキュメントアクセストークンに置き換えます。

<connection-string> プレースホルダー値を、Atlas クラスター SRV 接続文字列に置き換えます。

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

ベクトル埋め込みを生成する関数を定義する。

プロジェクトに common という名前のディレクトリを作成し、後の手順で使用する共通のコードを格納します。
```
mkdir common && cd common
```

get-embeddings.go というファイルを作成し、次のコードを貼り付けます。このコードでは、与えられた入力に対して埋め込みを生成する関数 GetEmbeddings を定義しています。この関数は、以下を指定します。

LangChain ライブラリの Go ポートを使用する feature-extraction タスク。詳細については、LangChain JavaScriptドキュメントの Tasks ドキュメントを参照してください。
mxbai-embed-large-v1 埋め込みモデル。

get-embeddings.go

package common
import (
	"context"
	"log"
	"github.com/tmc/langchaingo/embeddings/huggingface"
)
func GetEmbeddings(documents []string) [][]float32 {
	hf, err := huggingface.NewHuggingface(
		huggingface.WithModel("mixedbread-ai/mxbai-embed-large-v1"),
		huggingface.WithTask("feature-extraction"))
	if err != nil {
		log.Fatalf("failed to connect to Hugging Face: %v", err)
	}
	embs, err := hf.EmbedDocuments(context.Background(), documents)
	if err != nil {
		log.Fatalf("failed to generate embeddings: %v", err)
	}
	return embs
}

注意

は額文字モデルを呼び出す場合の 503

Hugging Face モデルハブモデルを呼び出すときに、503 エラーが発生する場合があります。この問題を解決するには、少し待ってから再試行します。

メインプロジェクトのルートディレクトリに戻ります。
```
cd ../
```

Go プロジェクトを初期化します。

ターミナルウィンドウで次のコマンドを実行して、 my-embeddings-projectという名前の新しいディレクトリを作成し、プロジェクトを初期化します。

mkdir my-embeddings-project
cd my-embeddings-project
go mod init my-embeddings-project

依存関係をインストールしてインポートします。

ターミナルウィンドウで、次のコマンドを実行します。

go get github.com/joho/godotenv
go get go.mongodb.org/mongo-driver/v2/mongo
go get github.com/milosgajdos/go-embeddings/openai

シークレットを管理するには、`.env`ファイルを作成します。

.envstringプロジェクトで、接続文字列と OpenAI APIトークンを保存するためのAPI ファイルを作成します。

OPENAI_API_KEY = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"

<api-key>と<connection-string> のプレースホルダー値を、AtlasAPI クラスターの OpenAI APIキーとstring SRV接続文字列に置き換えます。Atlasstring接続文字列には、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

注意

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

ベクトル埋め込みを生成する関数を定義する。

複数のステップで使用するコードを保存するために、common というディレクトリをプロジェクトに作成します。
```
mkdir common && cd common
```

get-embeddings.go というファイルを作成し、次のコードを貼り付けます。このコードは、OpenAI の text-embedding-3-small モデルを使用して特定の入力の埋め込みを生成する GetEmbeddings という関数を定義します。

get-embeddings.go

package common
import (
	"context"
	"log"
	"github.com/milosgajdos/go-embeddings/openai"
)
func GetEmbeddings(docs []string) [][]float64 {
	c := openai.NewClient()
	embReq := &openai.EmbeddingRequest{
		Input:          docs,
		Model:          openai.TextSmallV3,
		EncodingFormat: openai.EncodingFloat,
	}
	embs, err := c.Embed(context.Background(), embReq)
	if err != nil {
		log.Fatalf("failed to connect to OpenAI: %v", err)
	}
	var vectors [][]float64
	for _, emb := range embs {
		vectors = append(vectors, emb.Vector)
	}
	return vectors
}

メインプロジェクトのルートディレクトリに戻ります。
```
cd ../
```

注意

Javaプロジェクトを作成し、依存関係をインストールします。

IDE から、Maven または Gradle を使用してJavaプロジェクトを作成します。

パッケージマネージャーに応じて、次の依存関係を追加します。

Maven を使用している場合は、プロジェクトの pom.xmlファイルの dependencies 配列に次の依存関係を追加します。

pom.xml

<dependencies>
   <!-- MongoDB Java Sync Driver v5.2.0 or later -->
   <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-sync</artifactId>
      <version>[5.2.0,)</version>
   </dependency>
   <!-- Java library for working with Hugging Face models -->
   <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-hugging-face</artifactId>
      <version>0.35.0</version>
   </dependency>
</dependencies>

Gradle を使用している場合は、プロジェクトの build.gradleファイルの dependencies 配列に以下を追加します。

build.grouple

dependencies {
   // MongoDB Java Sync Driver v5.2.0 or later
   implementation 'org.mongodb:mongodb-driver-sync:[5.2.0,)'
   // Java library for working with Hugging Face models
   implementation 'dev.langchain4j:langchain4j-hugging-face:0.35.0'
}

パッケージマネージャーを実行して、プロジェクトに依存関係をインストールします。

環境変数を設定します。

注意

この例では、 IDE でプロジェクトの変数を設定します。実稼働アプリケーションでは、配置構成、 CI/CDパイプライン、またはシークレットマネージャーを使用して環境変数を管理する場合がありますが、提供されたコードをユースケースに合わせて調整できます。

IDE で新しい構成テンプレートを作成し、次の変数をプロジェクトに追加します。

IntelliJ IDEA を使用している場合は、新しい Application 実行構成テンプレートを作成し、Environment variables フィールドに変数をセミコロン区切りの値として追加してください（例: FOO=123;BAR=456）。変更を適用し、OK をクリックします。
詳細については、IntelliJ IDEA ドキュメントの「テンプレートから実行/デバッグ構成を作成する」セクションを参照してください。
Eclipse を使用している場合は、新しい Java Application 起動構成を作成し、各変数を新しいキーと値のペアとして Environmentタブに追加します。変更を適用し、OK をクリックします。
詳細については、Eclipse IDE ドキュメントの「 Javaアプリケーション起動構成の作成」セクションを参照してください。

環境変数

HUGGING_FACE_ACCESS_TOKEN=<access-token>
ATLAS_CONNECTION_STRING=<connection-string>

プレースホルダーを次の値で更新します。

「<access-token>」プレースホルダーの値を、Hugging Face アクセストークンに置き換えます。
<connection-string> プレースホルダー値を、Atlas クラスター SRV 接続文字列に置き換えます。
接続stringには、次の形式を使用する必要があります。
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

ベクトル埋め込みを生成する方法を定義します。

EmbeddingProvider.javaという名前のファイルを作成し、次のコードを貼り付けます。

このコードでは、mxます。1

複数の入力:getEmbeddings メソッドはテキスト入力の配列（List<String> ）を受け入れるため、1 回のAPI呼び出しに複数の埋め込みを作成できます。このメソッドは、 API が提供する浮動小数点数の配列を、Atlas クラスターに保存するために double のBSON配列に変換します。
単一入力:getEmbedding メソッドは単一のString を受け入れます。これはベクトルデータに対して実行するクエリを表します。メソッドは、 APIが提供する浮動小数点数の配列を、コレクションをクエリするときに使用する double のBSON配列に変換します。

EmbeddingProvider.java

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.huggingface.HuggingFaceEmbeddingModel;
import dev.langchain4j.model.output.Response;
import org.bson.BsonArray;
import org.bson.BsonDouble;
import java.util.List;
import static java.time.Duration.ofSeconds;
public class EmbeddingProvider {
    private static HuggingFaceEmbeddingModel embeddingModel;
    private static HuggingFaceEmbeddingModel getEmbeddingModel() {
        if (embeddingModel == null) {
            String accessToken = System.getenv("HUGGING_FACE_ACCESS_TOKEN");
            if (accessToken == null || accessToken.isEmpty()) {
                throw new RuntimeException("HUGGING_FACE_ACCESS_TOKEN env variable is not set or is empty.");
            }
            embeddingModel = HuggingFaceEmbeddingModel.builder()
                    .accessToken(accessToken)
                    .modelId("mixedbread-ai/mxbai-embed-large-v1")
                    .waitForModel(true)
                    .timeout(ofSeconds(60))
                    .build();
        }
        return embeddingModel;
    }
    /**
     * Takes an array of strings and returns a BSON array of embeddings to
     * store in the database.
     */
    public List<BsonArray> getEmbeddings(List<String> texts) {
        List<TextSegment> textSegments = texts.stream()
                .map(TextSegment::from)
                .toList();
        Response<List<Embedding>> response = getEmbeddingModel().embedAll(textSegments);
        return response.content().stream()
                .map(e -> new BsonArray(
                        e.vectorAsList().stream()
                                .map(BsonDouble::new)
                                .toList()))
                .toList();
    }
    /**
     * Takes a single string and returns a BSON array embedding to
     * use in a vector query.
     */
    public BsonArray getEmbedding(String text) {
        Response<Embedding> response = getEmbeddingModel().embed(text);
        return new BsonArray(
                response.content().vectorAsList().stream()
                        .map(BsonDouble::new)
                        .toList());
    }
}

Javaプロジェクトを作成し、依存関係をインストールします。

IDE から、Maven または Gradle を使用してJavaプロジェクトを作成します。

パッケージマネージャーに応じて、次の依存関係を追加します。

Maven を使用している場合は、プロジェクトの pom.xmlファイルの dependencies 配列に次の依存関係を追加します。

pom.xml

<dependencies>
   <!-- MongoDB Java Sync Driver v5.2.0 or later -->
   <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-sync</artifactId>
      <version>[5.2.0,)</version>
   </dependency>
   <!-- Java library for working with OpenAI models -->
   <dependency>
      <groupId>dev.langchain4j</groupId>
      <artifactId>langchain4j-open-ai</artifactId>
      <version>0.35.0</version>
   </dependency>
</dependencies>

Gradle を使用している場合は、プロジェクトの build.gradleファイルの dependencies 配列に以下を追加します。

build.grouple

dependencies {
   // MongoDB Java Sync Driver v5.2.0 or later
   implementation 'org.mongodb:mongodb-driver-sync:[5.2.0,)'
   // Java library for working with OpenAI models
   implementation 'dev.langchain4j:langchain4j-open-ai:0.35.0'
}

パッケージマネージャーを実行して、プロジェクトに依存関係をインストールします。

環境変数を設定します。

注意

IDE で新しい構成テンプレートを作成し、次の変数をプロジェクトに追加します。

IntelliJ IDEA を使用している場合は、新しい Application 実行構成テンプレートを作成し、Environment variables フィールドに変数をセミコロン区切りの値として追加してください（例: FOO=123;BAR=456）。変更を適用し、OK をクリックします。
詳細については、IntelliJ IDEA ドキュメントの「テンプレートから実行/デバッグ構成を作成する」セクションを参照してください。
Eclipse を使用している場合は、新しい Java Application 起動構成を作成し、各変数を新しいキーと値のペアとして Environmentタブに追加します。変更を適用し、OK をクリックします。
詳細については、Eclipse IDE ドキュメントの「 Javaアプリケーション起動構成の作成」セクションを参照してください。

環境変数

OPEN_AI_API_KEY=<api-key>
ATLAS_CONNECTION_STRING=<connection-string>

プレースホルダーを次の値で更新します。

``<api-key>`` プレースホルダーの値を OpenAI API キーに置き換えてください。
<connection-string> プレースホルダー値を、Atlas クラスター SRV 接続文字列に置き換えます。
接続stringには、次の形式を使用する必要があります。
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

ベクトル埋め込みを生成する方法を定義します。

EmbeddingProvider.javaという名前のファイルを作成し、次のコードを貼り付けます。

このコードでは、テキスト埋め込み- 3-small OpenAI 埋め込みモデルを使用して、特定の入力の埋め込みを生成するための 2 つの方法を定義しています。

複数の入力:getEmbeddings メソッドはテキスト入力の配列（List<String> ）を受け入れるため、1 回のAPI呼び出しに複数の埋め込みを作成できます。このメソッドは、 API が提供する浮動小数点数の配列を、Atlas クラスターに保存するために double のBSON配列に変換します。
単一入力:getEmbedding メソッドは単一のString を受け入れます。これはベクトルデータに対して実行するクエリを表します。メソッドは、 APIが提供する浮動小数点数の配列を、コレクションをクエリするときに使用する double のBSON配列に変換します。

EmbeddingProvider.java

import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.model.output.Response;
import org.bson.BsonArray;
import org.bson.BsonDouble;
import java.util.List;
import static java.time.Duration.ofSeconds;
public class EmbeddingProvider {
    private static OpenAiEmbeddingModel embeddingModel;
    private static OpenAiEmbeddingModel getEmbeddingModel() {
        if (embeddingModel == null) {
            String apiKey = System.getenv("OPEN_AI_API_KEY");
            if (apiKey == null || apiKey.isEmpty()) {
                throw new IllegalStateException("OPEN_AI_API_KEY env variable is not set or is empty.");
            }
            return OpenAiEmbeddingModel.builder()
                    .apiKey(apiKey)
                    .modelName("text-embedding-3-small")
                    .timeout(ofSeconds(60))
                    .build();
        }
        return embeddingModel;
    }
    /**
     * Takes an array of strings and returns a BSON array of embeddings to
     * store in the database.
     */
    public List<BsonArray> getEmbeddings(List<String> texts) {
        List<TextSegment> textSegments = texts.stream()
                .map(TextSegment::from)
                .toList();
        Response<List<Embedding>> response = getEmbeddingModel().embedAll(textSegments);
        return response.content().stream()
                .map(e -> new BsonArray(
                        e.vectorAsList().stream()
                                .map(BsonDouble::new)
                                .toList()))
                .toList();
    }
    /**
     * Takes a single string and returns a BSON array embedding to
     * use in a vector query.
     */
    public BsonArray getEmbedding(String text) {
        Response<Embedding> response = getEmbeddingModel().embed(text);
        return new BsonArray(
                response.content().vectorAsList().stream()
                        .map(BsonDouble::new)
                        .toList());
    }
}

このセクションでは、埋め込みモデルを用いてベクトル埋め込みを生成する関数を定義します。オープンソースの埋め込みモデルを使用するか、または OpenAI の独自モデルを使用するのかによって、タブを選択します。

このセクションには、効率的なストレージとクエリパフォーマンスの向上のために埋め込みを圧縮するために使用できるオプションの関数も含まれています。詳細については、ベクトル圧縮を参照してください。

注意

Node.js プロジェクトを初期化します。

ターミナルウィンドウで次のコマンドを実行して、 my-embeddings-projectという名前の新しいディレクトリを作成し、プロジェクトを初期化します。

mkdir my-embeddings-project
cd my-embeddings-project
npm init -y

`package.json`ファイルを更新します。

ES モジュール "type": "module"package.jsonを使用するようにプロジェクトを構成するをファイルに追加して保存する方法

{
  "type": "module",
  // other fields...
}

依存関係をインストールしてインポートします。

ターミナルウィンドウで、次のコマンドを実行します。

npm install mongodb @xenova/transformers

`.env`ファイルを作成します。

プロジェクトに、Atlas 接続文字列を保存するための .env ファイルを作成します。
```
ATLAS_CONNECTION_STRING="<connection-string>"
```
<connection-string> プレースホルダー値を、Atlas クラスター SRV 接続文字列に置き換えます。接続文字列には、次の形式を使用する必要があります。
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```
注意
Node.js の最低バージョン要件
Node.js v20.x は、--env-file オプションを導入しました。古いバージョンの Node.js を使用している場合は、dotenv パッケージをプロジェクトに追加するか、別の方法を使用して環境変数を管理します。
ファイルを保存します。

ベクトル埋め込みを生成する関数を定義する。

get-embeddings.jsという名前のファイルを作成します。
```
touch get-embeddings.js
```

次のコードをファイルに貼り付けます。

このコードでは、指定された入力の埋め込みを生成するための getEmbedding という名前の関数を定義します。この関数は以下を指定します。

feature-extractionHuge Page の transformations.js からのタスクライブラリ。詳しくは、「タスク」を参照してください。
noomic- embedded-text-v1 埋め込みモデル。

get-embeddings.js

import { pipeline } from '@xenova/transformers';
// Function to generate embeddings for a given data source
export async function getEmbedding(data) {
    const embedder = await pipeline(
        'feature-extraction', 
        'Xenova/nomic-embed-text-v1');
    const results = await embedder(data, { pooling: 'mean', normalize: true });
    return Array.from(results.data);
}

ファイルを保存します。

（高度）埋め込みを圧縮します。

このセクションを展開して、埋め込みをBSONバイナリ形式に変換する関数を定義します。

オプションで、埋め込みをBSONバイナリ形式（binData ベクトルと呼ばれる）に変換して埋め込みを圧縮し、効率的なストレージと検索を可能にします。詳細については、ベクトル圧縮を参照してください。

convert-embeddings.jsという名前のファイルを作成します。
```
touch convert-embeddings.js
```

次のコードをファイルに貼り付けます。

このコードでは、Node.jsドライバーのバイナリツールを使用して float32 埋め込みを binData ベクトルに変換するための convertEmbeddingsToBSON という名前の関数を定義します。

注意

埋め込みを binData ベクトルに変換すると、埋め込みはバイナリ形式で表示されます。

convert-embeddings.js

import { Binary } from 'mongodb';
// Exported async function to convert provided embeddings to BSON format
export async function convertEmbeddingsToBSON(float32_embeddings) {
  try {
    // Validate input
    if (!Array.isArray(float32_embeddings) || float32_embeddings.length === 0) {
      throw new Error("Input must be a non-empty array of embeddings");
    }
    // Convert float32 embeddings to BSON binary representations
    const bsonFloat32Embeddings = float32_embeddings.map(embedding => {
      if (!(embedding instanceof Array)) {
        throw new Error("Each embedding must be an array of numbers");
      }
      return Binary.fromFloat32Array(new Float32Array(embedding));
    });
    // Return the BSON embedding
    return bsonFloat32Embeddings[0]; 
  } catch (error) {
    console.error('Error during BSON conversion:', error);
    throw error; // Re-throw the error for handling by the caller if needed
  }
}

ファイルを保存します。

Node.js プロジェクトを初期化します。

ターミナルウィンドウで次のコマンドを実行して、 my-embeddings-projectという名前の新しいディレクトリを作成し、プロジェクトを初期化します。

mkdir my-embeddings-project
cd my-embeddings-project
npm init -y

`package.json`ファイルを更新します。

ES モジュール "type": "module"package.jsonを使用するようにプロジェクトを構成するをファイルに追加して保存する方法

{
  "type": "module",
  // other fields...
}

依存関係をインストールしてインポートします。

ターミナルウィンドウで、次のコマンドを実行します。

npm install mongodb openai

`.env`ファイルを作成します。

プロジェクトに、Atlas 接続文字列と OpenAI APIキーを保存するための .env ファイルを作成します。
```
OPENAI_API_KEY = "<api-key>"
ATLAS_CONNECTION_STRING = "<connection-string>"
```
<api-key>と<connection-string> のプレースホルダー値を、AtlasAPI クラスターの OpenAI APIキーとstring SRV接続文字列に置き換えます。Atlasstring接続文字列には、次の形式を使用する必要があります。
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```
注意
Node.js の最低バージョン要件
Node.js v20.x は、--env-file オプションを導入しました。古いバージョンの Node.js を使用している場合は、dotenv パッケージをプロジェクトに追加するか、別の方法を使用して環境変数を管理します。
ファイルを保存します。

ベクトル埋め込みを生成する関数を定義する。

get-embeddings.jsという名前のファイルを作成します。

次のコードを貼り付けます。このコードは、OpenAI の text-embedding-3-small モデルを使用して特定の入力の埋め込みを生成する getEmbedding という名前の関数を定義します。

get-embeddings.js

import OpenAI from 'openai';
// Setup OpenAI configuration
const openai = new OpenAI({apiKey: process.env.OPENAI_API_KEY});
// Function to get the embeddings using the OpenAI API
export async function getEmbedding(text) {
    const results = await openai.embeddings.create({
        model: "text-embedding-3-small",
        input: text,
        encoding_format: "float",
    });
    return results.data[0].embedding;
}

ファイルを保存します。

（高度）埋め込みを圧縮します。

このセクションを展開して、埋め込みをBSONバイナリ形式に変換する関数を定義します。

convert-embeddings.jsという名前のファイルを作成します。
```
touch convert-embeddings.js
```

次のコードをファイルに貼り付けます。

注意

埋め込みを binData ベクトルに変換すると、埋め込みはバイナリ形式で表示されます。

convert-embeddings.js

import { Binary } from 'mongodb';
// Exported async function to convert provided embeddings to BSON format
export async function convertEmbeddingsToBSON(float32_embeddings) {
  try {
    // Validate input
    if (!Array.isArray(float32_embeddings) || float32_embeddings.length === 0) {
      throw new Error("Input must be a non-empty array of embeddings");
    }
    // Convert float32 embeddings to BSON binary representations
    const bsonFloat32Embeddings = float32_embeddings.map(embedding => {
      if (!(embedding instanceof Array)) {
        throw new Error("Each embedding must be an array of numbers");
      }
      return Binary.fromFloat32Array(new Float32Array(embedding));
    });
    // Return the BSON embedding
    return bsonFloat32Embeddings[0]; 
  } catch (error) {
    console.error('Error during BSON conversion:', error);
    throw error; // Re-throw the error for handling by the caller if needed
  }
}

ファイルを保存します。

注意

環境を設定します。

.ipynb拡張機能のファイルを保存してインタラクティブ Python ノートを作成し、ノート内で次のコマンドを実行して依存関係をインストールします。

pip install --quiet --upgrade sentence-transformers pymongo einops

関数を定義してテストし、ベクトル埋め込みを生成します。

次のコードをノート PC に貼り付けて実行し、 Noonic AI のオープンソース埋め込みモデルを使用してベクトル埋め込みを生成する関数を作成します。このコードでは、次の処理が行われます。

noomic- embedded-text-v1 をロードする埋め込みモデル。
モデルを使用して特定のテキスト入力の埋め込みを生成する、get_embedding という名前の関数を作成します。デフォルトの精度は float32 です。
string の埋め込みを生成します foo。

from sentence_transformers import SentenceTransformer
# Load the embedding model
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
# Define a function to generate embeddings
def get_embedding(data, precision="float32"):
   return model.encode(data, precision=precision).tolist()
# Generate an embedding
embedding = get_embedding("foo")
print(embedding)

[-0.02980827  0.03841474 -0.02561123 ... -0.0532876 -0.0335409 -0.02591543]

（高度）埋め込みを圧縮します。

このセクションを展開して、埋め込みをBSONバイナリ形式に変換する関数を定義します。

このコードでは、次の処理が行われます。

PyMongoドライバーのバイナリツールを使用して埋め込みを binData ベクトルに変換するための generate_bson_vector という名前の関数を定義します。
string foo に対して生成した埋め込みを binData ベクトルに変換します。

注意

埋め込みを binData ベクトルに変換すると、埋め込みはバイナリ形式で表示されます。

from bson.binary import Binary 
from bson.binary import BinaryVectorDtype
# Define a function to generate BSON vectors
def generate_bson_vector(vector, vector_dtype):
   return Binary.from_vector(vector, vector_dtype)
# Generate BSON vector from the sample float32 embedding
bson_float32_embedding = generate_bson_vector(embedding, BinaryVectorDtype.FLOAT32)
# Print the converted embedding
print(f"The converted BSON embedding is: {bson_float32_embedding}")

The converted BSON embedding is: [Binary(b'\'\x00x0\xf4\ ... x9bL\xd4\xbc', 9), Binary(b'\'\x007 ... \x9e?\xe6<', 9)]

環境を設定します。

.ipynb拡張機能のファイルを保存してインタラクティブ Python ノートを作成し、ノート内で次のコマンドを実行して依存関係をインストールします。

pip install --quiet --upgrade openai pymongo

ベクトル埋め込みを生成するための関数を定義してテストします。

次のコードをノートに貼り付けて実行し、 OpenAI の独自の埋め込みモデルを使用してベクトル埋め込みを生成する関数を作成します。<api-key>を OpenAI APIキーに置き換えます。このコードでは、次の処理が行われます。

text-embedding-3-small埋め込みモデルを指定します。
モデルのAPIを呼び出して指定されたテキスト入力の埋め込みを生成するget_embeddingという名前の関数を作成します。
文字列 foo の単一の埋め込みを生成して関数をテストします。

import os
from openai import OpenAI
# Specify your OpenAI API key and embedding model
os.environ["OPENAI_API_KEY"] = "<api-key>"
model = "text-embedding-3-small"
openai_client = OpenAI()
# Define a function to generate embeddings
def get_embedding(text):
   """Generates vector embeddings for the given text."""
   embedding = openai_client.embeddings.create(input = [text], model=model).data[0].embedding
   return embedding
# Generate an embedding
embedding = get_embedding("foo")
print(embedding)

[-0.005843308754265308, -0.013111298903822899, -0.014585349708795547, 0.03580040484666824, 0.02671629749238491, ... ]

以下も参照してください。

API の詳細と利用可能なモデルのリストについては、 OpenAI ドキュメントを参照してください。

（高度）埋め込みを圧縮します。

このセクションを展開して、埋め込みをBSONバイナリ形式に変換する関数を定義します。

このコードでは、次の処理が行われます。

PyMongoドライバーのバイナリツールを使用して埋め込みを binData ベクトルに変換するための generate_bson_vector という名前の関数を定義します。
string foo に対して生成した埋め込みを binData ベクトルに変換します。

注意

埋め込みを binData ベクトルに変換すると、埋め込みはバイナリ形式で表示されます。

from bson.binary import Binary 
from bson.binary import BinaryVectorDtype
# Define a function to generate BSON vectors
def generate_bson_vector(vector, vector_dtype):
   return Binary.from_vector(vector, vector_dtype)
# Generate BSON vector from the sample float32 embedding
bson_float32_embedding = generate_bson_vector(embedding, BinaryVectorDtype.FLOAT32)
# Print the converted embedding
print(f"The converted BSON embedding is: {bson_float32_embedding}")

The converted BSON embedding is: b'\'\x00:y\xbf\...\xbb\xdaC\x9a\xbc'

データから埋め込みを作成

このセクションでは、定義した関数を使用してデータからベクトル埋め込みを作成し、これらの埋め込みを Atlas のコレクションに保存します。

新しいデータから埋め込みを作成するのか、または Atlas にすでにある既存のデータから作成するのかによって、タブを選択します。

`DataService` クラスを定義します。

DataService.cs という名前のファイルに同じ名前の新しいクラスを作成し、次のコードを貼り付けます。このコードは、AddDocumentsAsync という名前の非同期タスクを定義し、Atlas にドキュメントを追加します。この関数は、Collection.InsertManyAsync() C# ドライバーメソッドを使用して、BsonDocument 型のリストを挿入します。各ドキュメントには、次の内容が含まれています。

text フィールドには、ムービーの概要が含まれます。
embedding フィールドには、ベクトル埋め込みの生成によって得られた浮動小数点数の配列が含まれています。

DataService.cs

namespace MyCompany.Embeddings;
using MongoDB.Driver;
using MongoDB.Bson;
public class DataService
{
    private static readonly string? ConnectionString = Environment.GetEnvironmentVariable("ATLAS_CONNECTION_STRING");
    private static readonly MongoClient Client = new MongoClient(ConnectionString);
    private static readonly IMongoDatabase Database = Client.GetDatabase("sample_db");
    private static readonly IMongoCollection<BsonDocument> Collection = Database.GetCollection<BsonDocument>("embeddings");
    
    public async Task AddDocumentsAsync(Dictionary<string, float[]> embeddings)
    {
        var documents = new List<BsonDocument>();
        foreach( KeyValuePair<string, float[]> var in embeddings )
        {
            var document = new BsonDocument
            {
                {
                    "text", var.Key
                },
                {
                    "embedding", new BsonArray(var.Value)
                }
            };
            documents.Add(document);
        }
        await Collection.InsertManyAsync(documents);
        Console.WriteLine($"Successfully inserted {embeddings.Count} documents into Atlas");
        documents.Clear();
    }
}

プロジェクト内の `Program.cs` を更新します。

次のコードを使用して、Atlas の既存のコレクションから埋め込みを生成します。

具体的には、このコードではあなたが定義した GetEmbeddingsAsync 関数を使用して、サンプルテキストの配列から埋め込みを生成し、Atlas の sample_db.embeddings コレクションに取り込みます。

Program.cs

using MyCompany.Embeddings;
var aiService = new AIService();
var texts = new string[]
{
    "Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
    "The Lion King: Lion cub and future king Simba searches for his identity",
    "Avatar: A marine is dispatched to the moon Pandora on a unique mission"
};
var embeddings = await aiService.GetEmbeddingsAsync(texts);
var dataService = new DataService();
await dataService.AddDocumentsAsync(embeddings);

プロジェクトをコンパイルして実行します。

dotnet run MyCompany.Embeddings.csproj

Successfully inserted 3 documents into Atlas

また、クラスター内のsample_db.embeddingsコレクションに移動すると、Atlas UI でベクトル埋め込みを表示できます。

注意

この例では、サンプルデータの sample_airbnb.listingsAndReviews コレクションを使用していますが、クラスター内のどのコレクションでも動作するようにコードを調整できます。

`DataService` クラスを定義します。

DataService.cs という名前のファイルに同じ名前の新しいクラスを作成し、次のコードを貼り付けます。このコードは、次の処理を行う 2 つの関数を作成します。

Atlas クラスターに接続します。
GetDocuments メソッドは、sample_airbnb.listingsAndReviews コレクションから空でない summary フィールドを持つドキュメントのサブセットを取得します。
AddEmbeddings 非同期タスクは、GetDocuments メソッドで検索されたドキュメントの1つと _id が一致する sample_airbnb.listingsAndReviews コレクション内のドキュメントに新しい embeddings フィールドを作成します。

DataService.cs

namespace MyCompany.Embeddings;
using MongoDB.Driver;
using MongoDB.Bson;
public class DataService
{
    private static readonly string? ConnectionString = Environment.GetEnvironmentVariable("ATLAS_CONNECTION_STRING");
    private static readonly MongoClient Client = new MongoClient(ConnectionString);
    private static readonly IMongoDatabase Database = Client.GetDatabase("sample_airbnb");
    private static readonly IMongoCollection<BsonDocument> Collection = Database.GetCollection<BsonDocument>("listingsAndReviews");
    public List<BsonDocument>? GetDocuments()
    {
        var filter = Builders<BsonDocument>.Filter.And(
            Builders<BsonDocument>.Filter.And(
                Builders<BsonDocument>.Filter.Exists("summary", true),
                Builders<BsonDocument>.Filter.Ne("summary", "")
            ),
            Builders<BsonDocument>.Filter.Exists("embeddings", false)
        );
        return Collection.Find(filter).Limit(50).ToList(); 
    }
    public async Task<long> AddEmbeddings(Dictionary<string, float[]> embeddings)
    {
        var listWrites = new List<WriteModel<BsonDocument>>();
        foreach( var kvp in embeddings )
        {
            var filterForUpdate = Builders<BsonDocument>.Filter.Eq("summary", kvp.Key);
            var updateDefinition = Builders<BsonDocument>.Update.Set("embeddings", kvp.Value);
            listWrites.Add(new UpdateOneModel<BsonDocument>(filterForUpdate, updateDefinition));
        }
        var result = await Collection.BulkWriteAsync(listWrites);
        listWrites.Clear();
        return result.ModifiedCount;
    }
}

プロジェクト内の `Program.cs` を更新します。

次のコードを使用して、Atlas の既存のコレクションから埋め込みを生成します。

Program.cs

using MyCompany.Embeddings;
var dataService = new DataService();
var documents = dataService.GetDocuments();
if (documents != null)
{
    Console.WriteLine("Generating embeddings.");
    var aiService = new AIService();
    var summaries = new List<string>();
    foreach (var document in documents)
    {
        var summary = document.GetValue("summary").ToString();
        if (summary != null)
        {
            summaries.Add(summary);
        }
    }
    
    try
    {
        if (summaries.Count > 0)
        {
            var embeddings = await aiService.GetEmbeddingsAsync(summaries.ToArray());
        
            try
            {
                var updatedCount = await dataService.AddEmbeddings(embeddings);
                Console.WriteLine($"{updatedCount} documents updated successfully.");
            } catch (Exception e)
            {
                Console.WriteLine($"Error adding embeddings to MongoDB: {e.Message}");
            }
        }
    }
    catch (Exception e)
    {
        Console.WriteLine($"Error creating embeddings for summaries: {e.Message}");
    }
}
else
{
    Console.WriteLine("No documents found");
}

プロジェクトをコンパイルして実行します。

dotnet run MyCompany.Embeddings.csproj

Generating embeddings.
50 documents updated successfully.

`create-embeddings.go`という名前のファイルを作成し、次のコードを貼り付けます。

次のコードを使用して、Atlas の既存のコレクションから埋め込みを生成します。

具体的には、このコードではあなたが定義した GetEmbeddings 関数と MongoDB Go ドライバーを使用して、サンプルテキストの配列から埋め込みを生成し、Atlas の sample_db.embeddings コレクションに取り込みます。

create- embeddeds.go

package main
import (
	"context"
	"fmt"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
)
var data = []string{
	"Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
	"The Lion King: Lion cub and future king Simba searches for his identity",
	"Avatar: A marine is dispatched to the moon Pandora on a unique mission",
}
type TextWithEmbedding struct {
	Text      string
	Embedding []float32
}
func main() {
	ctx := context.Background()
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_db").Collection("embeddings")
	embeddings := common.GetEmbeddings(data)
	docsToInsert := make([]interface{}, len(embeddings))
	for i, string := range data {
		docsToInsert[i] = TextWithEmbedding{
			Text:      string,
			Embedding: embeddings[i],
		}
	}
	result, err := coll.InsertMany(ctx, docsToInsert)
	if err != nil {
		log.Fatalf("failed to insert documents: %v", err)
	}
	fmt.Printf("Successfully inserted %v documents into Atlas\n", len(result.InsertedIDs))
}

create- embeddeds.go

package main
import (
	"context"
	"fmt"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
)
var data = []string{
	"Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
	"The Lion King: Lion cub and future king Simba searches for his identity",
	"Avatar: A marine is dispatched to the moon Pandora on a unique mission",
}
type TextWithEmbedding struct {
	Text      string
	Embedding []float64
}
func main() {
	ctx := context.Background()
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_db").Collection("embeddings")
	embeddings := common.GetEmbeddings(data)
	docsToInsert := make([]interface{}, len(data))
	for i, movie := range data {
		docsToInsert[i] = TextWithEmbedding{
			Text:      movie,
			Embedding: embeddings[i],
		}
	}
	result, err := coll.InsertMany(ctx, docsToInsert)
	if err != nil {
		log.Fatalf("failed to insert documents: %v", err)
	}
	fmt.Printf("Successfully inserted %v documents into Atlas\n", len(result.InsertedIDs))
}

ファイルを保存して実行します。

go run create-embeddings.go

Successfully inserted 3 documents into Atlas

go run create-embeddings.go

Successfully inserted 3 documents into Atlas

また、クラスター内のsample_db.embeddingsコレクションに移動すると、Atlas UI でベクトル埋め込みを表示できます。

注意

`create-embeddings.go`という名前のファイルを作成し、次のコードを貼り付けます。

次のコードを使用して、Atlas の既存のコレクションから埋め込みを生成します。具体的には、このコードは次のことを行います。

Atlas クラスターへの接続
sample_airbnb.listingsAndReviews コレクションから、空でない summary フィールドを持つドキュメントのサブセットを取得します。
定義した GetEmbeddings 関数を使用して、各ドキュメントの summary フィールドから埋め込みを生成します。
MongoDB Go ドライバーを使用して、埋め込み値を含む新しい embeddings フィールドで各ドキュメントを更新します。

create- embeddeds.go

package main
import (
	"context"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/v2/bson"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
)
func main() {
	ctx := context.Background()
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_airbnb").Collection("listingsAndReviews")
	filter := bson.D{
		{"$and",
			bson.A{
				bson.D{
					{"$and",
						bson.A{
							bson.D{{"summary", bson.D{{"$exists", true}}}},
							bson.D{{"summary", bson.D{{"$ne", ""}}}},
						},
					}},
				bson.D{{"embeddings", bson.D{{"$exists", false}}}},
			}},
	}
	opts := options.Find().SetLimit(50)
	cursor, err := coll.Find(ctx, filter, opts)
	if err != nil {
		log.Fatalf("failed to retrieve documents: %v", err)
	}
	var listings []common.Listing
	if err = cursor.All(ctx, &listings); err != nil {
		log.Fatalf("failed to unmarshal retrieved documents to Listing object: %v", err)
	}
	var summaries []string
	for _, listing := range listings {
		summaries = append(summaries, listing.Summary)
	}
	log.Println("Generating embeddings.")
	embeddings := common.GetEmbeddings(summaries)
	docsToUpdate := make([]mongo.WriteModel, len(listings))
	for i := range listings {
		docsToUpdate[i] = mongo.NewUpdateOneModel().
			SetFilter(bson.D{{"_id", listings[i].ID}}).
			SetUpdate(bson.D{{"$set", bson.D{{"embeddings", embeddings[i]}}}})
	}
	bulkWriteOptions := options.BulkWrite().SetOrdered(false)
	result, err := coll.BulkWrite(context.Background(), docsToUpdate, bulkWriteOptions)
	if err != nil {
		log.Fatalf("failed to write embeddings to existing documents: %v", err)
	}
	log.Printf("Successfully added embeddings to %v documents", result.ModifiedCount)
}

コレクションのGoモデルを含むファイルを作成します。

GoオブジェクトとBSONからのマーシャリングとアンマーシャリングを簡素化するには、このコレクション内のドキュメントのモデルを含むファイルを作成します。

commonディレクトリに移動します。
```
cd common
```

models.go という名前のファイルを作成し、次のコードをそのファイルに貼り付けます。

models.go

package common
import (
	"time"
	"go.mongodb.org/mongo-driver/v2/bson"
)
type Image struct {
	ThumbnailURL string `bson:"thumbnail_url"`
	MediumURL    string `bson:"medium_url"`
	PictureURL   string `bson:"picture_url"`
	XLPictureURL string `bson:"xl_picture_url"`
}
type Host struct {
	ID                 string   `bson:"host_id"`
	URL                string   `bson:"host_url"`
	Name               string   `bson:"host_name"`
	Location           string   `bson:"host_location"`
	About              string   `bson:"host_about"`
	ThumbnailURL       string   `bson:"host_thumbnail_url"`
	PictureURL         string   `bson:"host_picture_url"`
	Neighborhood       string   `bson:"host_neighborhood"`
	IsSuperhost        bool     `bson:"host_is_superhost"`
	HasProfilePic      bool     `bson:"host_has_profile_pic"`
	IdentityVerified   bool     `bson:"host_identity_verified"`
	ListingsCount      int32    `bson:"host_listings_count"`
	TotalListingsCount int32    `bson:"host_total_listings_count"`
	Verifications      []string `bson:"host_verifications"`
}
type Location struct {
	Type            string    `bson:"type"`
	Coordinates     []float64 `bson:"coordinates"`
	IsLocationExact bool      `bson:"is_location_exact"`
}
type Address struct {
	Street         string   `bson:"street"`
	Suburb         string   `bson:"suburb"`
	GovernmentArea string   `bson:"government_area"`
	Market         string   `bson:"market"`
	Country        string   `bson:"Country"`
	CountryCode    string   `bson:"country_code"`
	Location       Location `bson:"location"`
}
type Availability struct {
	Thirty         int32 `bson:"availability_30"`
	Sixty          int32 `bson:"availability_60"`
	Ninety         int32 `bson:"availability_90"`
	ThreeSixtyFive int32 `bson:"availability_365"`
}
type ReviewScores struct {
	Accuracy      int32 `bson:"review_scores_accuracy"`
	Cleanliness   int32 `bson:"review_scores_cleanliness"`
	CheckIn       int32 `bson:"review_scores_checkin"`
	Communication int32 `bson:"review_scores_communication"`
	Location      int32 `bson:"review_scores_location"`
	Value         int32 `bson:"review_scores_value"`
	Rating        int32 `bson:"review_scores_rating"`
}
type Review struct {
	ID           string    `bson:"_id"`
	Date         time.Time `bson:"date,omitempty"`
	ListingId    string    `bson:"listing_id"`
	ReviewerId   string    `bson:"reviewer_id"`
	ReviewerName string    `bson:"reviewer_name"`
	Comments     string    `bson:"comments"`
}
type Listing struct {
	ID                   string          `bson:"_id"`
	ListingURL           string          `bson:"listing_url"`
	Name                 string          `bson:"name"`
	Summary              string          `bson:"summary"`
	Space                string          `bson:"space"`
	Description          string          `bson:"description"`
	NeighborhoodOverview string          `bson:"neighborhood_overview"`
	Notes                string          `bson:"notes"`
	Transit              string          `bson:"transit"`
	Access               string          `bson:"access"`
	Interaction          string          `bson:"interaction"`
	HouseRules           string          `bson:"house_rules"`
	PropertyType         string          `bson:"property_type"`
	RoomType             string          `bson:"room_type"`
	BedType              string          `bson:"bed_type"`
	MinimumNights        string          `bson:"minimum_nights"`
	MaximumNights        string          `bson:"maximum_nights"`
	CancellationPolicy   string          `bson:"cancellation_policy"`
	LastScraped          time.Time       `bson:"last_scraped,omitempty"`
	CalendarLastScraped  time.Time       `bson:"calendar_last_scraped,omitempty"`
	FirstReview          time.Time       `bson:"first_review,omitempty"`
	LastReview           time.Time       `bson:"last_review,omitempty"`
	Accommodates         int32           `bson:"accommodates"`
	Bedrooms             int32           `bson:"bedrooms"`
	Beds                 int32           `bson:"beds"`
	NumberOfReviews      int32           `bson:"number_of_reviews"`
	Bathrooms            bson.Decimal128 `bson:"bathrooms"`
	Amenities            []string        `bson:"amenities"`
	Price                bson.Decimal128 `bson:"price"`
	WeeklyPrice          bson.Decimal128 `bson:"weekly_price"`
	MonthlyPrice         bson.Decimal128 `bson:"monthly_price"`
	CleaningFee          bson.Decimal128 `bson:"cleaning_fee"`
	ExtraPeople          bson.Decimal128 `bson:"extra_people"`
	GuestsIncluded       bson.Decimal128 `bson:"guests_included"`
	Image                Image           `bson:"images"`
	Host                 Host            `bson:"host"`
	Address              Address         `bson:"address"`
	Availability         Availability    `bson:"availability"`
	ReviewScores         ReviewScores    `bson:"review_scores"`
	Reviews              []Review        `bson:"reviews"`
	Embeddings           []float32       `bson:"embeddings,omitempty"`
}

models.go

package common
import (
	"time"
	"go.mongodb.org/mongo-driver/v2/bson"
)
type Image struct {
	ThumbnailURL string `bson:"thumbnail_url"`
	MediumURL    string `bson:"medium_url"`
	PictureURL   string `bson:"picture_url"`
	XLPictureURL string `bson:"xl_picture_url"`
}
type Host struct {
	ID                 string   `bson:"host_id"`
	URL                string   `bson:"host_url"`
	Name               string   `bson:"host_name"`
	Location           string   `bson:"host_location"`
	About              string   `bson:"host_about"`
	ThumbnailURL       string   `bson:"host_thumbnail_url"`
	PictureURL         string   `bson:"host_picture_url"`
	Neighborhood       string   `bson:"host_neighborhood"`
	IsSuperhost        bool     `bson:"host_is_superhost"`
	HasProfilePic      bool     `bson:"host_has_profile_pic"`
	IdentityVerified   bool     `bson:"host_identity_verified"`
	ListingsCount      int32    `bson:"host_listings_count"`
	TotalListingsCount int32    `bson:"host_total_listings_count"`
	Verifications      []string `bson:"host_verifications"`
}
type Location struct {
	Type            string    `bson:"type"`
	Coordinates     []float64 `bson:"coordinates"`
	IsLocationExact bool      `bson:"is_location_exact"`
}
type Address struct {
	Street         string   `bson:"street"`
	Suburb         string   `bson:"suburb"`
	GovernmentArea string   `bson:"government_area"`
	Market         string   `bson:"market"`
	Country        string   `bson:"Country"`
	CountryCode    string   `bson:"country_code"`
	Location       Location `bson:"location"`
}
type Availability struct {
	Thirty         int32 `bson:"availability_30"`
	Sixty          int32 `bson:"availability_60"`
	Ninety         int32 `bson:"availability_90"`
	ThreeSixtyFive int32 `bson:"availability_365"`
}
type ReviewScores struct {
	Accuracy      int32 `bson:"review_scores_accuracy"`
	Cleanliness   int32 `bson:"review_scores_cleanliness"`
	CheckIn       int32 `bson:"review_scores_checkin"`
	Communication int32 `bson:"review_scores_communication"`
	Location      int32 `bson:"review_scores_location"`
	Value         int32 `bson:"review_scores_value"`
	Rating        int32 `bson:"review_scores_rating"`
}
type Review struct {
	ID           string    `bson:"_id"`
	Date         time.Time `bson:"date,omitempty"`
	ListingId    string    `bson:"listing_id"`
	ReviewerId   string    `bson:"reviewer_id"`
	ReviewerName string    `bson:"reviewer_name"`
	Comments     string    `bson:"comments"`
}
type Listing struct {
	ID                   string          `bson:"_id"`
	ListingURL           string          `bson:"listing_url"`
	Name                 string          `bson:"name"`
	Summary              string          `bson:"summary"`
	Space                string          `bson:"space"`
	Description          string          `bson:"description"`
	NeighborhoodOverview string          `bson:"neighborhood_overview"`
	Notes                string          `bson:"notes"`
	Transit              string          `bson:"transit"`
	Access               string          `bson:"access"`
	Interaction          string          `bson:"interaction"`
	HouseRules           string          `bson:"house_rules"`
	PropertyType         string          `bson:"property_type"`
	RoomType             string          `bson:"room_type"`
	BedType              string          `bson:"bed_type"`
	MinimumNights        string          `bson:"minimum_nights"`
	MaximumNights        string          `bson:"maximum_nights"`
	CancellationPolicy   string          `bson:"cancellation_policy"`
	LastScraped          time.Time       `bson:"last_scraped,omitempty"`
	CalendarLastScraped  time.Time       `bson:"calendar_last_scraped,omitempty"`
	FirstReview          time.Time       `bson:"first_review,omitempty"`
	LastReview           time.Time       `bson:"last_review,omitempty"`
	Accommodates         int32           `bson:"accommodates"`
	Bedrooms             int32           `bson:"bedrooms"`
	Beds                 int32           `bson:"beds"`
	NumberOfReviews      int32           `bson:"number_of_reviews"`
	Bathrooms            bson.Decimal128 `bson:"bathrooms"`
	Amenities            []string        `bson:"amenities"`
	Price                bson.Decimal128 `bson:"price"`
	WeeklyPrice          bson.Decimal128 `bson:"weekly_price"`
	MonthlyPrice         bson.Decimal128 `bson:"monthly_price"`
	CleaningFee          bson.Decimal128 `bson:"cleaning_fee"`
	ExtraPeople          bson.Decimal128 `bson:"extra_people"`
	GuestsIncluded       bson.Decimal128 `bson:"guests_included"`
	Image                Image           `bson:"images"`
	Host                 Host            `bson:"host"`
	Address              Address         `bson:"address"`
	Availability         Availability    `bson:"availability"`
	ReviewScores         ReviewScores    `bson:"review_scores"`
	Reviews              []Review        `bson:"reviews"`
	Embeddings           []float64       `bson:"embeddings,omitempty"`
}

プロジェクトルートディレクトリに移動します。
```
cd ../
```

埋め込みを生成する。

go run create-embeddings.go

2024/10/10 09:58:03 Generating embeddings.
2024/10/10 09:58:12 Successfully added embeddings to 50 documents

Atlas UI でベクトル埋め込みを表示するには、クラスター内の sample_airbnb.listingsAndReviews コレクションに移動し、ドキュメントのフィールドを展開します。

Atlas の既存のコレクションから埋め込みを生成するコードを定義します。

CreateEmbeddings.javaという名前のファイルを作成し、次のコードを貼り付けます。

このコードでは、getEmbeddings メソッドとMongoDB Java Sync ドライバーを使用して次の操作を実行します。

Atlas クラスターに接続します。
サンプルテキストの配列を取得します。
以前に定義した getEmbeddings メソッドを使用して、各テキストから埋め込みを生成します。
Atlas の sample_db.embeddingsコレクションに埋め込みを取り込みます。

Create埋め込み.java

import com.mongodb.MongoException;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.result.InsertManyResult;
import org.bson.BsonArray;
import org.bson.Document;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class CreateEmbeddings {
    static List<String> data = Arrays.asList(
            "Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
            "The Lion King: Lion cub and future king Simba searches for his identity",
            "Avatar: A marine is dispatched to the moon Pandora on a unique mission"
    );
    public static void main(String[] args){
        String uri = System.getenv("ATLAS_CONNECTION_STRING");
        if (uri == null || uri.isEmpty()) {
            throw new RuntimeException("ATLAS_CONNECTION_STRING env variable is not set or is empty.");
        }
        // establish connection and set namespace
        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("sample_db");
            MongoCollection<Document> collection = database.getCollection("embeddings");
            System.out.println("Creating embeddings for " + data.size() + " documents");
            EmbeddingProvider embeddingProvider = new EmbeddingProvider();
            // generate embeddings for new inputted data
            List<BsonArray> embeddings = embeddingProvider.getEmbeddings(data);
            List<Document> documents = new ArrayList<>();
            int i = 0;
            for (String text : data) {
                Document doc = new Document("text", text).append("embedding", embeddings.get(i));
                documents.add(doc);
                i++;
            }
            // insert the embeddings into the Atlas collection
            List<String> insertedIds = new ArrayList<>();
            try {
                InsertManyResult result = collection.insertMany(documents);
                result.getInsertedIds().values()
                        .forEach(doc -> insertedIds.add(doc.toString()));
                System.out.println("Inserted " + insertedIds.size() + " documents with the following ids to " + collection.getNamespace() + " collection: \n " + insertedIds);
            } catch (MongoException me) {
                throw new RuntimeException("Failed to insert documents", me);
            }
        } catch (MongoException me) {
            throw new RuntimeException("Failed to connect to MongoDB ", me);
        } catch (Exception e) {
            throw new RuntimeException("Operation failed: ", e);
        }
    }
}

埋め込みを生成します。

ファイルを保存して実行します。出力は次のようになります。

Creating embeddings for 3 documents
Inserted 3 documents with the following ids to sample_db.embeddings collection:
 [BsonObjectId{value=6735ff620d88451041f6dd40}, BsonObjectId{value=6735ff620d88451041f6dd41}, BsonObjectId{value=6735ff620d88451041f6dd42}]

また、クラスター内のコレクションに移動すると、Atlas UIでベクトル埋め込みを表示できます。sample_db.embeddings

注意

Atlas の既存のコレクションから埋め込みを生成するコードを定義します。

CreateEmbeddings.javaという名前のファイルを作成し、次のコードを貼り付けます。

このコードでは、getEmbeddings メソッドとMongoDB Java Sync ドライバーを使用して次の操作を実行します。

Atlas クラスターに接続します。
空でない summaryフィールドを持つドキュメントのサブセットを sample_airbnb.listingsAndReviewsコレクションから取得します。
以前に定義した getEmbeddings メソッドを使用して、各ドキュメントの summaryフィールドから埋め込みを生成します。
埋め込み値を含む新しい embeddingsフィールドで各ドキュメントを更新します。

Create埋め込み.java

import com.mongodb.MongoException;
import com.mongodb.bulk.BulkWriteResult;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.BulkWriteOptions;
import com.mongodb.client.model.Filters;
import com.mongodb.client.model.UpdateOneModel;
import com.mongodb.client.model.Updates;
import com.mongodb.client.model.WriteModel;
import org.bson.BsonArray;
import org.bson.Document;
import org.bson.conversions.Bson;
import java.util.ArrayList;
import java.util.List;
public class CreateEmbeddings {
    public static void main(String[] args){
        String uri = System.getenv("ATLAS_CONNECTION_STRING");
        if (uri == null || uri.isEmpty()) {
            throw new RuntimeException("ATLAS_CONNECTION_STRING env variable is not set or is empty.");
        }
        // establish connection and set namespace
        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("sample_airbnb");
            MongoCollection<Document> collection = database.getCollection("listingsAndReviews");
            Bson filterCriteria = Filters.and(
                    Filters.and(Filters.exists("summary"),
                            Filters.ne("summary", null),
                            Filters.ne("summary", "")),
                    Filters.exists("embeddings", false));
            try (MongoCursor<Document> cursor = collection.find(filterCriteria).limit(50).iterator()) {
                List<String> summaries = new ArrayList<>();
                List<String> documentIds = new ArrayList<>();
                int i = 0;
                while (cursor.hasNext()) {
                    Document document = cursor.next();
                    String summary = document.getString("summary");
                    String id = document.get("_id").toString();
                    summaries.add(summary);
                    documentIds.add(id);
                    i++;
                }
                System.out.println("Generating embeddings for " + summaries.size() + " documents.");
                System.out.println("This operation may take up to several minutes.");
                EmbeddingProvider embeddingProvider = new EmbeddingProvider();
                List<BsonArray> embeddings = embeddingProvider.getEmbeddings(summaries);
                List<WriteModel<Document>> updateDocuments = new ArrayList<>();
                for (int j = 0; j < summaries.size(); j++) {
                    UpdateOneModel<Document> updateDoc = new UpdateOneModel<>(
                            Filters.eq("_id", documentIds.get(j)),
                            Updates.set("embeddings", embeddings.get(j)));
                    updateDocuments.add(updateDoc);
                }
                int updatedDocsCount = 0;
                try {
                    BulkWriteOptions options = new BulkWriteOptions().ordered(false);
                    BulkWriteResult result = collection.bulkWrite(updateDocuments, options);
                    updatedDocsCount = result.getModifiedCount();
                } catch (MongoException me) {
                    throw new RuntimeException("Failed to insert documents", me);
                }
                System.out.println("Added embeddings successfully to " + updatedDocsCount + " documents.");
            }
        } catch (MongoException me) {
            throw new RuntimeException("Failed to connect to MongoDB", me);
        } catch (Exception e) {
            throw new RuntimeException("Operation failed: ", e);
        }
    }
}

埋め込みを生成します。

ファイルを保存して実行します。出力は次のようになります。

Generating embeddings for 50 documents.
This operation may take up to several minutes.
Added embeddings successfully to 50 documents.

また、クラスター内のコレクションに移動すると、Atlas UIでベクトル埋め込みを表示できます。sample_airbnb.listingsAndReviews

`create-embeddings.js`という名前のファイルを作成し、次のコードを貼り付けます。

次のコードを使用して、Atlas の既存のコレクションから埋め込みを生成します。このコードでは、定義した getEmbedding 関数とNode.jsドライバーを使用して、サンプルテキストの配列から埋め込みを生成し、Atlas の sample_db.embeddingsコレクションに取り込みます。

convertEmbeddingsToBSON 関数を定義した場合、埋め込みをBSON binData ベクトルに変換するには、3 行と 32-33 行のコメントアウトを外します。

create- embeddeds.js

import { MongoClient } from 'mongodb';
import { getEmbedding } from './get-embeddings.js';
// import { convertEmbeddingsToBSON } from './convert-embeddings.js';
// Data to embed
const texts = [ 
    "Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
    "The Lion King: Lion cub and future king Simba searches for his identity",
    "Avatar: A marine is dispatched to the moon Pandora on a unique mission"
]
async function run() {
    // Connect to your Atlas cluster
    const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
    
    try {
        await client.connect();
        const db = client.db("sample_db");
        const collection = db.collection("embeddings");
        console.log("Generating embeddings and inserting documents...");
        const insertDocuments = [];
        await Promise.all(texts.map(async text => {
            // Check if the document already exists
            const existingDoc = await collection.findOne({ text: text });
            // Generate an embedding using the function that you defined
            var embedding = await getEmbedding(text);
            
            // Uncomment the following lines to convert the generated embedding into BSON format
            // const bsonEmbedding = await convertEmbeddingsToBSON([embedding]); // Since convertEmbeddingsToBSON is designed to handle arrays
            // embedding = bsonEmbedding; // Use BSON embedding instead of the original float32 embedding
                      
            // Add the document with the embedding to array of documents for bulk insert
            if (!existingDoc) {
                insertDocuments.push({
                    text: text,
                    embedding: embedding
                })
            }
        }));
        // Continue processing documents if an error occurs during an operation
        const options = { ordered: false };
        // Insert documents with embeddings into Atlas
        const result = await collection.insertMany(insertDocuments, options);  
        console.log("Count of documents inserted: " + result.insertedCount); 
    } catch (err) {
        console.log(err.stack);
    }
    finally {
        await client.close();
    }
}
run().catch(console.dir);

ファイルを保存して実行します。

node --env-file=.env create-embeddings.js

Generating embeddings and inserting documents...
Count of documents inserted: 3

クラスター内の sample_db.embeddingsコレクションに移動すると、Atlas UIでベクトル埋め込みを表示できます。

注意

`create-embeddings.js`という名前のファイルを作成し、次のコードを貼り付けます。

次のコードを使用して、Atlas の既存のコレクションから埋め込みを生成します。具体的には、このコードは次のことを行います。

Atlas クラスターへの接続
sample_airbnb.listingsAndReviews コレクションから、空でない summary フィールドを持つドキュメントのサブセットを取得します。
定義した getEmbedding 関数を使用して、各ドキュメントの summary フィールドから埋め込みを生成します。
MongoDB Node.js ドライバーを使用して、埋め込み値を含む新しい embedding フィールドで各ドキュメントを更新します。

convertEmbeddingsToBSON 関数を定義した場合、埋め込みをBSON binData ベクトルに変換するには、3 行と 29-30 行のコメントアウトを外します。

create- embeddeds.js

import { MongoClient } from 'mongodb';
import { getEmbedding } from './get-embeddings.js';
// import { convertEmbeddingsToBSON } from './convert-embeddings.js';
async function run() {
    // Connect to your Atlas cluster
    const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
    try {
        await client.connect();
        const db = client.db("sample_airbnb");
        const collection = db.collection("listingsAndReviews");
        // Filter to exclude null or empty summary fields
        const filter = { "summary": { "$nin": [ null, "" ] } };
        // Get a subset of documents from the collection
        const documents = await collection.find(filter).limit(50).toArray();
        console.log("Generating embeddings and updating documents...");
        const updateDocuments = [];
        await Promise.all(documents.map(async doc => {
            // Generate an embedding using the function that you defined
            var embedding = await getEmbedding(doc.summary);
            // Uncomment the following lines to convert the generated embedding into BSON format
            // const bsonEmbedding = await convertEmbeddingsToBSON([embedding]); // Since convertEmbeddingsToBSON is designed to handle arrays
            // embedding = bsonEmbedding; // Use BSON embedding instead of the original float32 embedding
             
            // Add the embedding to an array of update operations
            updateDocuments.push(
                {
                    updateOne: { 
                        filter: { "_id": doc._id },
                        update: { $set: { "embedding": embedding } }
                    }
                }
           )
       }));
       // Continue processing documents if an error occurs during an operation
       const options = { ordered: false };
       // Update documents with the new embedding field
       const result = await collection.bulkWrite(updateDocuments, options); 
       console.log("Count of documents updated: " + result.modifiedCount); 
            
    } catch (err) {
        console.log(err.stack);
    }
    finally {
        await client.close();
    }
}
run().catch(console.dir);

ファイルを保存して実行します。

node --env-file=.env create-embeddings.js

Generating embeddings and updating documents...
Count of documents updated: 50

データをロードします。

次のコードでは、サンプルテキストの配列を定義します。

# Sample data
texts = [
  "Titanic: The story of the 1912 sinking of the largest luxury liner ever built",
  "The Lion King: Lion cub and future king Simba searches for his identity",
  "Avatar: A marine is dispatched to the moon Pandora on a unique mission"
]

データから埋め込みを生成します。

データから埋め込みを生成するには、get_embedding 関数を使用します。次のコードを使用して、サンプルテキストから埋め込みを生成します。

generate_bson_vector 関数を定義した場合は、この関数を呼び出す行のコメントアウトを外して、埋め込みを binData ベクトルに圧縮します。埋め込みはバイナリ形式で表示されます。

# Generate embeddings from the sample data
embeddings = []
for text in texts:
 embedding = get_embedding(text)
 # Uncomment the following line to convert to BSON
 # embedding = generate_bson_vector(embedding, BinaryVectorDtype.FLOAT32)
 embeddings.append(embedding)
 # Print the embeddings
 print(f"\nText: {text}")
 print(f"Embedding: {embedding[:3]}... (truncated)")

Generated embeddings:
Text: Titanic: The story of the 1912 sinking of the largest luxury liner ever built
Embedding: [-0.01089042  0.05926645 -0.00291325]... (truncated)
Text: The Lion King: Lion cub and future king Simba searches for his identity
Embedding: [-0.05607051 -0.01360618  0.00523855]... (truncated)
Text: Avatar: A marine is dispatched to the moon Pandora on a unique mission
Embedding: [-0.0275258   0.01144342 -0.02360895]... (truncated)

埋め込みを Atlas に取り込みます。

埋め込みを含むドキュメントを作成し、Atlas クラスターに取り込むには、次の手順を実行します。

ドキュメントを作成するための関数を定義します。

def create_docs_with_embeddings(embeddings, data):
   docs = []
   for i, (embedding, text) in enumerate(zip(embeddings, data)):
      doc = {
            "_id": i,
            "text": text,
            "embedding": embedding,
      }
      docs.append(doc)
   return docs

埋め込みを含むドキュメントを作成します。
```
# Create documents with embeddings and sample data
docs = create_docs_with_embeddings(embeddings, texts)
```

ドキュメントを Atlas に取り込みます。

ノートに次のコードを貼り付けて実行し、<connection-string> を Atlas クラスターの SRV接続文字列に置き換えます。

注意

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

このコードでは、次の処理が行われます。

Atlas クラスターへの接続
指定されたデータベースとコレクションにドキュメントを挿入します。

import pymongo
# Connect to your Atlas cluster
mongo_client = pymongo.MongoClient("<connection-string>")
db = mongo_client["sample_db"]
collection = db["embeddings"]
# Ingest data into Atlas
collection.insert_many(docs)

InsertManyResult([0, 1, 2], acknowledged=True)

ベクトル埋め込みを確認するには、クラスター内の sample_db.embeddings名前空間の Atlas UIで表示します。

注意

既存のデータをロードしてください。

Atlas クラスターからデータをロードします。次のコードは、sample_airbnb.listingAndReviewsコレクションから 50 ドキュメントのサブセットを取得します。

<connection-string> をAtlasクラスターのSRV接続stringに置き換えます。

注意

接続stringには、次の形式を使用する必要があります。

mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net

import pymongo
# Connect to your Atlas cluster
mongo_client = pymongo.MongoClient("<connection-string>")
db = mongo_client["sample_airbnb"]
collection = db["listingsAndReviews"]
# Define a filter to exclude documents with null or empty 'summary' fields
filter = { 'summary': { '$exists': True, "$nin": [ None, "" ] } }
# Get a subset of documents in the collection
documents = collection.find(filter, {'_id': 1, 'summary': 1}).limit(50)

埋め込みを生成し、Atlas でドキュメントを更新します。

前のステップでロードしたドキュメントから埋め込みを生成します。このコードでは、次の処理が行われます。

定義した get_embedding 関数を使用して、各ドキュメントの summary フィールドから埋め込みを生成します。
埋め込み値を含む新しい embeddingフィールドで各ドキュメントをアップデートします。

ベクトル埋め込みをBSON binData ベクトルに変換するために generate_bson_vector 関数を定義した場合は、コードを実行中前にこの関数を呼び出す行のコメントを外します。

注意

この操作は、完了するまでに数分かかる場合があります。

from pymongo import UpdateOne
# Generate the list of bulk write operations
operations = []
for doc in documents:
   summary = doc["summary"]
   # Generate embeddings for this document
   embedding = get_embedding(summary)
   # Uncomment the following line to convert to BSON vectors
   # embedding = generate_bson_vector(embedding, BinaryVectorDtype.FLOAT32)
   # Add the update operation to the list
   operations.append(UpdateOne(
      {"_id": doc["_id"]},
      {"$set": {
         "embedding": embedding
      }}
   ))
# Execute the bulk write operation
if operations:
   result = collection.bulk_write(operations)
   updated_doc_count = result.modified_count
print(f"Updated {updated_doc_count} documents.")

Updated 50 documents.

クエリの埋め込みの作成

このセクションでは、コレクション内のベクター埋め込みにインデックスを付け、サンプルのベクター検索クエリの実行に使用する埋め込みを作成します。

クエリを実行すると、Atlas Vector Search は、ベクトル検索クエリに含まれる埋め込みと最も距離の近い埋め込みを持つドキュメントを返します。これは、意味が似ていることを示しています。

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

以下の手順を実行して、embedding フィールドをベクトルタイプとして指定し、類似性の測定を dotProduct として指定する sample_db.embeddings コレクションのインデックスを作成します。

次のコードを貼り付けて、DataService.cs の DataServiceクラスに CreateVectorIndex 関数を追加してください。

DataService.cs

namespace MyCompany.Embeddings;
using MongoDB.Driver;
using MongoDB.Bson;
public class DataService
{
    private static readonly string? ConnectionString = Environment.GetEnvironmentVariable("ATLAS_CONNECTION_STRING");
    private static readonly MongoClient Client = new MongoClient(ConnectionString);
    private static readonly IMongoDatabase Database = Client.GetDatabase("sample_db");
    private static readonly IMongoCollection<BsonDocument> Collection = Database.GetCollection<BsonDocument>("embeddings");
    
    public async Task AddDocumentsAsync(Dictionary<string, float[]> embeddings)
    {
        // Method details...
    }
    
    public void CreateVectorIndex()
    {
        try
        {
            var searchIndexView = Collection.SearchIndexes;
            var name = "vector_index";
            var type = SearchIndexType.VectorSearch;
            var definition = new BsonDocument
            {
                { "fields", new BsonArray
                    {
                        new BsonDocument
                        {
                            { "type", "vector" },
                            { "path", "embedding" },
                            { "numDimensions", <dimensions> },
                            { "similarity", "dotProduct" }
                        }
                    }
                }
            };
            var model = new CreateSearchIndexModel(name, type, definition);
            searchIndexView.CreateOne(model);
            Console.WriteLine($"New search index named {name} is building.");
            // Polling for index status
            Console.WriteLine("Polling to check if the index is ready. This may take up to a minute.");
            bool queryable = false;
            while (!queryable)
            {
                var indexes = searchIndexView.List();
                foreach (var index in indexes.ToEnumerable())
                {
                    if (index["name"] == name)
                    {
                        queryable = index["queryable"].AsBoolean;
                    }
                }
                if (!queryable)
                {
                    Thread.Sleep(5000);
                }
            }
            Console.WriteLine($"{name} is ready for querying.");
        }
        catch (Exception e)
        {
            Console.WriteLine($"Exception: {e.Message}");
        }
    }
}

オープンソースモデルを使用した場合は <dimensions> プレースホルダー値を 1024 に置き換え、OpenAI のモデルを使用した場合は 1536 に置き換えます。
お客様の Program.cs のコードを更新してください。
初期ドキュメントを入力したコードを削除し、次のコードに置き換えてインデックスを作成します。
Program.cs
```
using MyCompany.Embeddings;
var dataService = new DataService();
dataService.CreateVectorIndex();
```
ファイルを保存してから、プロジェクトをコンパイルして実行し、インデックスを作成してください。
dotnet run MyCompany.Embeddings
New search index named vector_index is building. Polling to check if the index is ready. This may take up to a minute. vector_index is ready for querying.

注意

インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

次のコードを貼り付けて、DataService.cs の DataServiceクラスに PerformVectorQuery 関数を追加してください。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

DataService.cs

namespace MyCompany.Embeddings;
using MongoDB.Driver;
using MongoDB.Bson;
public class DataService
{
    private static readonly string? ConnectionString = Environment.GetEnvironmentVariable("ATLAS_CONNECTION_STRING");
    private static readonly MongoClient Client = new MongoClient(ConnectionString);
    private static readonly IMongoDatabase Database = Client.GetDatabase("sample_db");
    private static readonly IMongoCollection<BsonDocument> Collection = Database.GetCollection<BsonDocument>("embeddings");
    
    public async Task AddDocumentsAsync(Dictionary<string, float[]> embeddings)
    {
        // Method details...
    }
    
    public void CreateVectorIndex()
    {
        // Method details...
    }
    
    public List<BsonDocument>? PerformVectorQuery(float[] vector)
    {
        var vectorSearchStage = new BsonDocument
        {
            {
                "$vectorSearch",
                new BsonDocument
                {
                    { "index", "vector_index" },
                    { "path", "embedding" },
                    { "queryVector", new BsonArray(vector) },
                    { "exact", true },
                    { "limit", 5 }
                }
            }
        };
        var projectStage = new BsonDocument
        {
            {
                "$project",
                new BsonDocument
                {
                    { "_id", 0 },
                    { "text", 1 },
                    { "score", 
                        new BsonDocument
                        {
                            { "$meta", "vectorSearchScore"}
                        }
                    }
                }
            }
        };
        var pipeline = new[] { vectorSearchStage, projectStage };
        return Collection.Aggregate<BsonDocument>(pipeline).ToList();
    }
}

お客様の Program.cs のコードを更新してください。

ベクトルインデックスを作成したコードを削除し、クエリを実行するコードを追加します。

Program.cs

using MongoDB.Bson;
using MyCompany.Embeddings;
var aiService = new AIService();
var queryString = "ocean tragedy";
var queryEmbedding = await aiService.GetEmbeddingsAsync([queryString]);
if (!queryEmbedding.Any())
{
    Console.WriteLine("No embeddings found.");
}
else
{
    var dataService = new DataService();
    var matchingDocuments = dataService.PerformVectorQuery(queryEmbedding[queryString]);
    if (matchingDocuments == null)
    {
        Console.WriteLine("No documents matched the query.");
    }
    else
    {
        foreach (var document in matchingDocuments)
        {
            Console.WriteLine(document.ToJson());
        }
    }
}

ファイルを保存し、プロジェクトをコンパイル・実行して、クエリを実行します。

dotnet run MyCompany.Embeddings.csproj

{ "text" : "Titanic: The story of the 1912 sinking of the largest luxury liner ever built", "score" : 100.17414855957031 }
{ "text" : "Avatar: A marine is dispatched to the moon Pandora on a unique mission", "score" : 65.705635070800781 }
{ "text" : "The Lion King: Lion cub and future king Simba searches for his identity", "score" : 52.486415863037109 }

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

以下の手順を実行して、embeddings フィールドをベクトルタイプとして指定し、類似性の測定を euclidean として指定する sample_airbnb.listingsAndReviews コレクションのインデックスを作成します。

次のコードを貼り付けて、CreateVectorIndex 関数を DataService.cs の DataService クラスに追加します。

DataService.cs

namespace MyCompany.Embeddings;
using MongoDB.Driver;
using MongoDB.Bson;
public class DataService
{
    private static readonly string? ConnectionString = Environment.GetEnvironmentVariable("ATLAS_CONNECTION_STRING");
    private static readonly MongoClient Client = new MongoClient(ConnectionString);
    private static readonly IMongoDatabase Database = Client.GetDatabase("sample_airbnb");
    private static readonly IMongoCollection<BsonDocument> Collection = Database.GetCollection<BsonDocument>("listingsAndReviews");
    public List<BsonDocument>? GetDocuments()
    {
        // Method details...
    }
    public async Task<long> AddEmbeddings(Dictionary<string, float[]> embeddings)
    {
        // Method details...
    }
    public void CreateVectorIndex()
    {
        try
        {
            var searchIndexView = Collection.SearchIndexes;
            var name = "vector_index";
            var type = SearchIndexType.VectorSearch;
            var definition = new BsonDocument
            {
                { "fields", new BsonArray
                    {
                        new BsonDocument
                        {
                            { "type", "vector" },
                            { "path", "embeddings" },
                            { "numDimensions", <dimensions> },
                            { "similarity", "dotProduct" }
                        }
                    }
                }
            };
            var model = new CreateSearchIndexModel(name, type, definition);
            searchIndexView.CreateOne(model);
            Console.WriteLine($"New search index named {name} is building.");
            // Polling for index status
            Console.WriteLine("Polling to check if the index is ready. This may take up to a minute.");
            bool queryable = false;
            while (!queryable)
            {
                var indexes = searchIndexView.List();
                foreach (var index in indexes.ToEnumerable())
                {
                    if (index["name"] == name)
                    {
                        queryable = index["queryable"].AsBoolean;
                    }
                }
                if (!queryable)
                {
                    Thread.Sleep(5000);
                }
            }
            Console.WriteLine($"{name} is ready for querying.");
        }
        catch (Exception e)
        {
            Console.WriteLine($"Exception: {e.Message}");
        }
    }
}

オープンソースモデルを使用した場合は <dimensions> プレースホルダー値を 1024 に置き換え、OpenAI のモデルを使用した場合は 1536 に置き換えます。
お客様の Program.cs のコードを更新してください。
既存のドキュメントに埋め込みを追加したコードを削除し、次のコードに置き換えてインデックスを作成します。
Program.cs
```
using MyCompany.Embeddings;
var dataService = new DataService();
dataService.CreateVectorIndex();
```
ファイルを保存してから、プロジェクトをコンパイルして実行し、インデックスを作成してください。
dotnet run MyCompany.Embeddings
New search index named vector_index is building. Polling to check if the index is ready. This may take up to a minute. vector_index is ready for querying.

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

次のコードを貼り付けて、DataService.cs の DataServiceクラスに PerformVectorQuery 関数を追加してください。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

DataService.cs

namespace MyCompany.Embeddings;
using MongoDB.Driver;
using MongoDB.Bson;
public class DataService
{
    private static readonly string? ConnectionString = Environment.GetEnvironmentVariable("ATLAS_CONNECTION_STRING");
    private static readonly MongoClient Client = new MongoClient(ConnectionString);
    private static readonly IMongoDatabase Database = Client.GetDatabase("sample_airbnb");
    private static readonly IMongoCollection<BsonDocument> Collection = Database.GetCollection<BsonDocument>("listingsAndReviews");
    public List<BsonDocument>? GetDocuments()
    {
        // Method details...
    }
    public async Task<long> AddEmbeddings(Dictionary<string, float[]> embeddings)
    {
        // Method details...
    }
    public void CreateVectorIndex()
    {
        // Method details...
    }
    public List<BsonDocument>? PerformVectorQuery(float[] vector)
    {
        var vectorSearchStage = new BsonDocument
        {
            {
                "$vectorSearch",
                new BsonDocument
                {
                    { "index", "vector_index" },
                    { "path", "embeddings" },
                    { "queryVector", new BsonArray(vector) },
                    { "exact", true },
                    { "limit", 5 }
                }
            }
        };
        var projectStage = new BsonDocument
        {
            {
                "$project",
                new BsonDocument
                {
                    { "_id", 0 },
                    { "summary", 1 },
                    { "score", 
                        new BsonDocument
                        {
                            { "$meta", "vectorSearchScore"}
                        }
                    }
                }
            }
        };
        var pipeline = new[] { vectorSearchStage, projectStage };
        return Collection.Aggregate<BsonDocument>(pipeline).ToList();
    }
}

お客様の Program.cs のコードを更新してください。

ベクトルインデックスを作成したコードを削除し、クエリを実行するコードを追加します。

Program.cs

using MongoDB.Bson;
using MyCompany.Embeddings;
var aiService = new AIService();
var queryString = "beach house";
var queryEmbedding = await aiService.GetEmbeddingsAsync([queryString]);
if (!queryEmbedding.Any())
{
    Console.WriteLine("No embeddings found.");
}
else
{
    var dataService = new DataService();
    var matchingDocuments = dataService.PerformVectorQuery(queryEmbedding[queryString]);
    if (matchingDocuments == null)
    {
        Console.WriteLine("No documents matched the query.");
    }
    else
    {
        foreach (var document in matchingDocuments)
        {
            Console.WriteLine(document.ToJson());
        }
    }
}

ファイルを保存し、プロジェクトをコンパイル・実行して、クエリを実行します。

dotnet run MyCompany.Embeddings.csproj

{ "summary" : "Near to underground metro station. Walking distance to seaside. 2 floors 1 entry. Husband, wife, girl and boy is living.", "score" : 88.884147644042969 }
{ "summary" : "A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.", "score" : 86.136398315429688 }
{ "summary" : "Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.", "score" : 86.087783813476562 }
{ "summary" : "Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.", "score" : 85.689559936523438 }
{ "summary" : "Fully furnished 3+1 flat decorated with vintage style.  Located at the heart of Moda/Kadıköy, close to seaside and also to the public transportation (tram, metro, ferry, bus stations) 10 minutes walk.", "score" : 85.614166259765625 }

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

create-index.goという名前のファイルを作成し、次のコードを貼り付けます。

create-index.go

package main
import (
	"context"
	"fmt"
	"log"
	"os"
	"time"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/v2/bson"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
)
func main() {
	ctx := context.Background()
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_db").Collection("embeddings")
	indexName := "vector_index"
	opts := options.SearchIndexes().SetName(indexName).SetType("vectorSearch")
	type vectorDefinitionField struct {
		Type          string `bson:"type"`
		Path          string `bson:"path"`
		NumDimensions int    `bson:"numDimensions"`
		Similarity    string `bson:"similarity"`
	}
	type vectorDefinition struct {
		Fields []vectorDefinitionField `bson:"fields"`
	}
	indexModel := mongo.SearchIndexModel{
		Definition: vectorDefinition{
			Fields: []vectorDefinitionField{{
				Type:          "vector",
				Path:          "embedding",
				NumDimensions: <dimensions>,
				Similarity:    "dotProduct"}},
		},
		Options: opts,
	}
	log.Println("Creating the index.")
	searchIndexName, err := coll.SearchIndexes().CreateOne(ctx, indexModel)
	if err != nil {
		log.Fatalf("failed to create the search index: %v", err)
	}
	// Await the creation of the index.
	log.Println("Polling to confirm successful index creation.")
	searchIndexes := coll.SearchIndexes()
	var doc bson.Raw
	for doc == nil {
		cursor, err := searchIndexes.List(ctx, options.SearchIndexes().SetName(searchIndexName))
		if err != nil {
			fmt.Errorf("failed to list search indexes: %w", err)
		}
		if !cursor.Next(ctx) {
			break
		}
		name := cursor.Current.Lookup("name").StringValue()
		queryable := cursor.Current.Lookup("queryable").Boolean()
		if name == searchIndexName && queryable {
			doc = cursor.Current
		} else {
			time.Sleep(5 * time.Second)
		}
	}
	log.Println("Name of Index Created: " + searchIndexName)
}

オープンソースモデルを使用した場合は <dimensions> プレースホルダー値を 1024 に置き換え、OpenAI のモデルを使用した場合は 1536 に置き換えます。
ファイルを保存し、次のコマンドを実行します。
go run create-index.go
2024/10/09 17:38:51 Creating the index. 2024/10/09 17:38:52 Polling to confirm successful index creation. 2024/10/09 17:39:22 Name of Index Created: vector_index

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

vector-query.goという名前のファイルを作成し、次のコードを貼り付けます。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

vector-query.go

package main
import (
	"context"
	"fmt"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/v2/bson"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
)
type TextAndScore struct {
	Text  string  `bson:"text"`
	Score float32 `bson:"score"`
}
func main() {
	ctx := context.Background()
	// Connect to your Atlas cluster
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_db").Collection("embeddings")
	query := "ocean tragedy"
	queryEmbedding := common.GetEmbeddings([]string{query})
	pipeline := mongo.Pipeline{
		bson.D{
			{"$vectorSearch", bson.D{
				{"queryVector", queryEmbedding[0]},
				{"index", "vector_index"},
				{"path", "embedding"},
				{"exact", true},
				{"limit", 5},
			}},
		},
		bson.D{
			{"$project", bson.D{
				{"_id", 0},
				{"text", 1},
				{"score", bson.D{
					{"$meta", "vectorSearchScore"},
				}},
			}},
		},
	}
	// Run the pipeline
	cursor, err := coll.Aggregate(ctx, pipeline)
	if err != nil {
		log.Fatalf("failed to run aggregation: %v", err)
	}
	defer func() { _ = cursor.Close(ctx) }()
	var matchingDocs []TextAndScore
	if err = cursor.All(ctx, &matchingDocs); err != nil {
		log.Fatalf("failed to unmarshal results to TextAndScore objects: %v", err)
	}
	for _, doc := range matchingDocs {
		fmt.Printf("Text: %v\nScore: %v\n", doc.Text, doc.Score)
	}
}

vector-query.go

package main
import (
	"context"
	"fmt"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/v2/bson"
	"go.mongodb.org/mongo-driver/v2/mongo"
	"go.mongodb.org/mongo-driver/v2/mongo/options"
)
type TextAndScore struct {
	Text  string  `bson:"text"`
	Score float64 `bson:"score"`
}
func main() {
	ctx := context.Background()
	// Connect to your Atlas cluster
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_db").Collection("embeddings")
	query := "ocean tragedy"
	queryEmbedding := common.GetEmbeddings([]string{query})
	pipeline := mongo.Pipeline{
		bson.D{
			{"$vectorSearch", bson.D{
				{"queryVector", queryEmbedding[0]},
				{"index", "vector_index"},
				{"path", "embedding"},
				{"exact", true},
				{"limit", 5},
			}},
		},
		bson.D{
			{"$project", bson.D{
				{"_id", 0},
				{"text", 1},
				{"score", bson.D{
					{"$meta", "vectorSearchScore"},
				}},
			}},
		},
	}
	// Run the pipeline
	cursor, err := coll.Aggregate(ctx, pipeline)
	if err != nil {
		log.Fatalf("failed to run aggregation: %v", err)
	}
	defer func() { _ = cursor.Close(ctx) }()
	var matchingDocs []TextAndScore
	if err = cursor.All(ctx, &matchingDocs); err != nil {
		log.Fatalf("failed to unmarshal results to TextAndScore objects: %v", err)
	}
	for _, doc := range matchingDocs {
		fmt.Printf("Text: %v\nScore: %v\n", doc.Text, doc.Score)
	}
}

ファイルを保存し、次のコマンドを実行します。

go run vector-query.go

Text: Titanic: The story of the 1912 sinking of the largest luxury liner ever built
Score: 0.0042472864
Text: Avatar: A marine is dispatched to the moon Pandora on a unique mission
Score: 0.0031167597
Text: The Lion King: Lion cub and future king Simba searches for his identity
Score: 0.0024476869

go run vector-query.go

Text: Titanic: The story of the 1912 sinking of the largest luxury liner ever built
Score: 0.4552372694015503
Text: Avatar: A marine is dispatched to the moon Pandora on a unique mission
Score: 0.4050072133541107
Text: The Lion King: Lion cub and future king Simba searches for his identity
Score: 0.35942140221595764

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

create-index.goという名前のファイルを作成し、次のコードを貼り付けます。

create-index.go

package main
import (
	"context"
	"fmt"
	"log"
	"os"
	"time"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
)
func main() {
	ctx := context.Background()
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(ctx, clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_airbnb").Collection("listingsAndReviews")
	indexName := "vector_index"
	opts := options.SearchIndexes().SetName(indexName).SetType("vectorSearch")
	type vectorDefinitionField struct {
		Type          string `bson:"type"`
		Path          string `bson:"path"`
		NumDimensions int    `bson:"numDimensions"`
		Similarity    string `bson:"similarity"`
	}
	type vectorDefinition struct {
		Fields []vectorDefinitionField `bson:"fields"`
	}
	indexModel := mongo.SearchIndexModel{
		Definition: vectorDefinition{
			Fields: []vectorDefinitionField{{
				Type:          "vector",
				Path:          "embeddings",
				NumDimensions: <dimensions>,
				Similarity:    "dotProduct"}},
		},
		Options: opts,
	}
	log.Println("Creating the index.")
	searchIndexName, err := coll.SearchIndexes().CreateOne(ctx, indexModel)
	if err != nil {
		log.Fatalf("failed to create the search index: %v", err)
	}
	// Await the creation of the index.
	log.Println("Polling to confirm successful index creation.")
	searchIndexes := coll.SearchIndexes()
	var doc bson.Raw
	for doc == nil {
		cursor, err := searchIndexes.List(ctx, options.SearchIndexes().SetName(searchIndexName))
		if err != nil {
			fmt.Errorf("failed to list search indexes: %w", err)
		}
		if !cursor.Next(ctx) {
			break
		}
		name := cursor.Current.Lookup("name").StringValue()
		queryable := cursor.Current.Lookup("queryable").Boolean()
		if name == searchIndexName && queryable {
			doc = cursor.Current
		} else {
			time.Sleep(5 * time.Second)
		}
	}
	log.Println("Name of Index Created: " + searchIndexName)
}

オープンソースモデルを使用した場合は <dimensions> プレースホルダー値を 1024 に置き換え、OpenAI のモデルを使用した場合は 1536 に置き換えます。
ファイルを保存し、次のコマンドを実行します。
go run create-index.go
2024/10/10 10:03:12 Creating the index. 2024/10/10 10:03:13 Polling to confirm successful index creation. 2024/10/10 10:03:44 Name of Index Created: vector_index

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

vector-query.goという名前のファイルを作成し、次のコードを貼り付けます。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

vector-query.go

package main
import (
	"context"
	"fmt"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
)
type SummaryAndScore struct {
	Summary string  `bson:"summary"`
	Score   float32 `bson:"score"`
}
func main() {
	ctx := context.Background()
	// Connect to your Atlas cluster
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(ctx, clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_airbnb").Collection("listingsAndReviews")
	query := "beach house"
	queryEmbedding := common.GetEmbeddings([]string{query})
	pipeline := mongo.Pipeline{
		bson.D{
			{"$vectorSearch", bson.D{
				{"queryVector", queryEmbedding[0]},
				{"index", "vector_index"},
				{"path", "embeddings"},
				{"exact", true},
				{"limit", 5},
			}},
		},
		bson.D{
			{"$project", bson.D{
				{"_id", 0},
				{"summary", 1},
				{"score", bson.D{
					{"$meta", "vectorSearchScore"},
				}},
			}},
		},
	}
	// Run the pipeline
	cursor, err := coll.Aggregate(ctx, pipeline)
	if err != nil {
		log.Fatalf("failed to run aggregation: %v", err)
	}
	defer func() { _ = cursor.Close(ctx) }()
	var matchingDocs []SummaryAndScore
	if err = cursor.All(ctx, &matchingDocs); err != nil {
		log.Fatalf("failed to unmarshal results to SummaryAndScore objects: %v", err)
	}
	for _, doc := range matchingDocs {
		fmt.Printf("Summary: %v\nScore: %v\n", doc.Summary, doc.Score)
	}
}

vector-query.go

package main
import (
	"context"
	"fmt"
	"log"
	"my-embeddings-project/common"
	"os"
	"github.com/joho/godotenv"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
)
type SummaryAndScore struct {
	Summary string  `bson:"summary"`
	Score   float64 `bson:"score"`
}
func main() {
	ctx := context.Background()
	// Connect to your Atlas cluster
	if err := godotenv.Load(); err != nil {
		log.Println("no .env file found")
	}
	// Connect to your Atlas cluster
	uri := os.Getenv("ATLAS_CONNECTION_STRING")
	if uri == "" {
		log.Fatal("set your 'ATLAS_CONNECTION_STRING' environment variable.")
	}
	clientOptions := options.Client().ApplyURI(uri)
	client, err := mongo.Connect(ctx, clientOptions)
	if err != nil {
		log.Fatalf("failed to connect to the server: %v", err)
	}
	defer func() { _ = client.Disconnect(ctx) }()
	// Set the namespace
	coll := client.Database("sample_airbnb").Collection("listingsAndReviews")
	query := "beach house"
	queryEmbedding := common.GetEmbeddings([]string{query})
	pipeline := mongo.Pipeline{
		bson.D{
			{"$vectorSearch", bson.D{
				{"queryVector", queryEmbedding[0]},
				{"index", "vector_index"},
				{"path", "embeddings"},
				{"exact", true},
				{"limit", 5},
			}},
		},
		bson.D{
			{"$project", bson.D{
				{"_id", 0},
				{"summary", 1},
				{"score", bson.D{
					{"$meta", "vectorSearchScore"},
				}},
			}},
		},
	}
	// Run the pipeline
	cursor, err := coll.Aggregate(ctx, pipeline)
	if err != nil {
		log.Fatalf("failed to run aggregation: %v", err)
	}
	defer func() { _ = cursor.Close(ctx) }()
	var matchingDocs []SummaryAndScore
	if err = cursor.All(ctx, &matchingDocs); err != nil {
		log.Fatalf("failed to unmarshal results to SummaryAndScore objects: %v", err)
	}
	for _, doc := range matchingDocs {
		fmt.Printf("Summary: %v\nScore: %v\n", doc.Summary, doc.Score)
	}
}

ファイルを保存し、次のコマンドを実行します。

go run vector-query.go

Summary: Near to underground metro station. Walking distance to seaside. 2 floors 1 entry. Husband, wife, girl and boy is living.
Score: 0.0045180833
Summary: A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.
Score: 0.004480799
Summary: Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.
Score: 0.0042421296
Summary: Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.
Score: 0.004227752
Summary: A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.
Score: 0.0042201905

go run vector-query.go

Summary: A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.
Score: 0.4832950830459595
Summary: Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.
Score: 0.48093676567077637
Summary: THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!
Score: 0.4629695415496826
Summary: A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.
Score: 0.45800843834877014
Summary: The Apartment has a living room, toilet, bedroom (suite) and American kitchen. Well located, on the Copacabana beach block a 05 Min. walk from Ipanema beach (Arpoador). Internet wifi, cable tv, air conditioning in the bedroom, ceiling fans in the bedroom and living room, kitchen with microwave, cooker, Blender, dishes, cutlery and service area with fridge, washing machine, clothesline for drying clothes and closet with several utensils for use.  The property boasts 45 m2.
Score: 0.45398443937301636

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

CreateIndex.java という名前のファイルを作成し、次のコードを貼り付けます。

CreateIndex.java

import com.mongodb.MongoException;
import com.mongodb.client.ListSearchIndexesIterable;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.SearchIndexModel;
import com.mongodb.client.model.SearchIndexType;
import org.bson.Document;
import org.bson.conversions.Bson;
import java.util.Collections;
import java.util.List;
public class CreateIndex {
    public static void main(String[] args) {
        String uri = System.getenv("ATLAS_CONNECTION_STRING");
        if (uri == null || uri.isEmpty()) {
            throw new IllegalStateException("ATLAS_CONNECTION_STRING env variable is not set or is empty.");
        }
        // establish connection and set namespace
        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("sample_db");
            MongoCollection<Document> collection = database.getCollection("embeddings");
            // define the index details
            String indexName = "vector_index";
            int dimensionsHuggingFaceModel = 1024;
            int dimensionsOpenAiModel = 1536;
            Bson definition = new Document(
                    "fields",
                    Collections.singletonList(
                                    new Document("type", "vector")
                                            .append("path", "embedding")
                                            .append("numDimensions", <dimensions>) // replace with var for the model used
                            .append("similarity", "dotProduct")));
            // define the index model using the specified details
            SearchIndexModel indexModel = new SearchIndexModel(
                    indexName,
                    definition,
                    SearchIndexType.vectorSearch());
            // Create the index using the model
            try {
                List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
                System.out.println("Successfully created a vector index named: " + result);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
            // Wait for Atlas to build the index and make it queryable
            System.out.println("Polling to confirm the index has completed building.");
            System.out.println("It may take up to a minute for the index to build before you can query using it.");
            ListSearchIndexesIterable<Document> searchIndexes = collection.listSearchIndexes();
            Document doc = null;
            while (doc == null) {
                try (MongoCursor<Document> cursor = searchIndexes.iterator()) {
                    if (!cursor.hasNext()) {
                        break;
                    }
                    Document current = cursor.next();
                    String name = current.getString("name");
                    boolean queryable = current.getBoolean("queryable");
                    if (name.equals(indexName) && queryable) {
                        doc = current;
                    } else {
                        Thread.sleep(500);
                    }
                }  catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
            System.out.println(indexName + " index is ready to query");
        } catch (MongoException me) {
            throw new RuntimeException("Failed to connect to MongoDB ", me);
        } catch (Exception e) {
            throw new RuntimeException("Operation failed: ", e);
        }
    }
}

<dimensions> プレースホルダー値を、使用したモデルに適した変数に置き換えます。
- dimensionsHuggingFaceModel: 1024 次元（「ixedbread-AI/mxます埋め込み-Large-v1」モデル）
- dimensionsOpenAiModel: 1536 次元（"text- embedded-3-small" モデル）
注意
次元数は、埋め込みの生成に使用されるモデルによって決まります。このコードを調整して別のモデルを使用する場合は、必ず正しい値を numDimensions に渡してください。「埋め込みモデルの選択」セクションも参照してください。

ファイルを保存して実行します。出力は次のようになります。

Successfully created a vector index named: [vector_index]
Polling to confirm the index has completed building.
It may take up to a minute for the index to build before you can query using it.
vector_index index is ready to query

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

VectorQuery.javaという名前のファイルを作成し、次のコードを貼り付けます。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

VectorQuery.java

import com.mongodb.MongoException;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.search.FieldSearchPath;
import org.bson.BsonArray;
import org.bson.BsonValue;
import org.bson.Document;
import org.bson.conversions.Bson;
import java.util.ArrayList;
import java.util.List;
import static com.mongodb.client.model.Aggregates.project;
import static com.mongodb.client.model.Aggregates.vectorSearch;
import static com.mongodb.client.model.Projections.exclude;
import static com.mongodb.client.model.Projections.fields;
import static com.mongodb.client.model.Projections.include;
import static com.mongodb.client.model.Projections.metaVectorSearchScore;
import static com.mongodb.client.model.search.SearchPath.fieldPath;
import static com.mongodb.client.model.search.VectorSearchOptions.exactVectorSearchOptions;
import static java.util.Arrays.asList;
public class VectorQuery {
    public static void main(String[] args) {
        String uri = System.getenv("ATLAS_CONNECTION_STRING");
        if (uri == null || uri.isEmpty()) {
            throw new IllegalStateException("ATLAS_CONNECTION_STRING env variable is not set or is empty.");
        }
        // establish connection and set namespace
        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("sample_db");
            MongoCollection<Document> collection = database.getCollection("embeddings");
            // define $vectorSearch query options
            String query = "ocean tragedy";
            EmbeddingProvider embeddingProvider = new EmbeddingProvider();
            BsonArray embeddingBsonArray = embeddingProvider.getEmbedding(query);
            List<Double> embedding = new ArrayList<>();
            for (BsonValue value : embeddingBsonArray.stream().toList()) {
                embedding.add(value.asDouble().getValue());
            }
            // define $vectorSearch pipeline
            String indexName = "vector_index";
            FieldSearchPath fieldSearchPath = fieldPath("embedding");
            int limit = 5;
            List<Bson> pipeline = asList(
                    vectorSearch(
                            fieldSearchPath,
                            embedding,
                            indexName,
                            limit,
                            exactVectorSearchOptions()
                    ),
                    project(
                            fields(exclude("_id"), include("text"),
                                    metaVectorSearchScore("score"))));
            // run query and print results
            List<Document> results = collection.aggregate(pipeline).into(new ArrayList<>());
            if (results.isEmpty()) {
                System.out.println("No results found.");
            } else {
                results.forEach(doc -> {
                    System.out.println("Text: " + doc.getString("text"));
                    System.out.println("Score: " + doc.getDouble("score"));
                });
            }
        } catch (MongoException me) {
            throw new RuntimeException("Failed to connect to MongoDB ", me);
        } catch (Exception e) {
            throw new RuntimeException("Operation failed: ", e);
        }
    }
}

ファイルを保存して実行します。出力は、使用したモデルに応じて、次のいずれかのようになります。

Text: Titanic: The story of the 1912 sinking of the largest luxury liner ever built
Score: 0.004247286356985569
Text: Avatar: A marine is dispatched to the moon Pandora on a unique mission
Score: 0.003116759704425931
Text: The Lion King: Lion cub and future king Simba searches for his identity
Score: 0.002447686856612563

Text: Titanic: The story of the 1912 sinking of the largest luxury liner ever built
Score: 0.45522359013557434
Text: Avatar: A marine is dispatched to the moon Pandora on a unique mission
Score: 0.4049977660179138
Text: The Lion King: Lion cub and future king Simba searches for his identity
Score: 0.35942474007606506

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

CreateIndex.java という名前のファイルを作成し、次のコードを貼り付けます。

CreateIndex.java

import com.mongodb.MongoException;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.ListSearchIndexesIterable;
import com.mongodb.client.MongoCursor;
import com.mongodb.client.model.SearchIndexModel;
import com.mongodb.client.model.SearchIndexType;
import org.bson.Document;
import org.bson.conversions.Bson;
import java.util.Collections;
import java.util.List;
public class CreateIndex {
    public static void main(String[] args) {
        String uri = System.getenv("ATLAS_CONNECTION_STRING");
        if (uri == null || uri.isEmpty()) {
            throw new IllegalStateException("ATLAS_CONNECTION_STRING env variable is not set or is empty.");
        }
        // establish connection and set namespace
        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("sample_airbnb");
            MongoCollection<Document> collection = database.getCollection("listingsAndReviews");
            // define the index details
            String indexName = "vector_index";
            int dimensionsHuggingFaceModel = 1024;
            int dimensionsOpenAiModel = 1536;
            Bson definition = new Document(
                    "fields",
                    Collections.singletonList(
                            new Document("type", "vector")
                                    .append("path", "embeddings")
                                    .append("numDimensions", <dimensions>) // replace with var for the model used
                                    .append("similarity", "dotProduct")));
            // define the index model using the specified details
            SearchIndexModel indexModel = new SearchIndexModel(
                    indexName,
                    definition,
                    SearchIndexType.vectorSearch());
            // create the index using the model
            try {
                List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
                System.out.println("Successfully created a vector index named: " + result);
                System.out.println("It may take up to a minute for the index to build before you can query using it.");
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
            // wait for Atlas to build the index and make it queryable
            System.out.println("Polling to confirm the index has completed building.");
            ListSearchIndexesIterable<Document> searchIndexes = collection.listSearchIndexes();
            Document doc = null;
            while (doc == null) {
                try (MongoCursor<Document> cursor = searchIndexes.iterator()) {
                    if (!cursor.hasNext()) {
                        break;
                    }
                    Document current = cursor.next();
                    String name = current.getString("name");
                    boolean queryable = current.getBoolean("queryable");
                    if (name.equals(indexName) && queryable) {
                        doc = current;
                    } else {
                        Thread.sleep(500);
                    }
                }  catch (Exception e) {
                    throw new RuntimeException(e);
                }
            }
            System.out.println(indexName + " index is ready to query");
        } catch (MongoException me) {
            throw new RuntimeException("Failed to connect to MongoDB ", me);
        } catch (Exception e) {
            throw new RuntimeException("Operation failed: ", e);
        }
    }
}

<dimensions> プレースホルダー値を、使用したモデルに適した変数に置き換えます。
- dimensionsHuggingFaceModel: 1024 次元（オープンソース）
- dimensionsOpenAiModel: 1536 次元
注意
次元数は、埋め込みの生成に使用されるモデルによって決まります。別のモデルを使用している場合は、必ず正しい値を numDimensions に渡してください。「埋め込みモデルの選択」セクションも参照してください。

ファイルを保存して実行します。出力は次のようになります。

Successfully created a vector index named: [vector_index]
Polling to confirm the index has completed building.
It may take up to a minute for the index to build before you can query using it.
vector_index index is ready to query

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

VectorQuery.javaという名前のファイルを作成し、次のコードを貼り付けます。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

VectorQuery.java

import com.mongodb.MongoException;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import com.mongodb.client.model.search.FieldSearchPath;
import org.bson.BsonArray;
import org.bson.BsonValue;
import org.bson.Document;
import org.bson.conversions.Bson;
import java.util.ArrayList;
import java.util.List;
import static com.mongodb.client.model.Aggregates.project;
import static com.mongodb.client.model.Aggregates.vectorSearch;
import static com.mongodb.client.model.Projections.exclude;
import static com.mongodb.client.model.Projections.fields;
import static com.mongodb.client.model.Projections.include;
import static com.mongodb.client.model.Projections.metaVectorSearchScore;
import static com.mongodb.client.model.search.SearchPath.fieldPath;
import static com.mongodb.client.model.search.VectorSearchOptions.exactVectorSearchOptions;
import static java.util.Arrays.asList;
public class VectorQuery {
    public static void main(String[] args) {
        String uri = System.getenv("ATLAS_CONNECTION_STRING");
        if (uri == null || uri.isEmpty()) {
            throw new IllegalStateException("ATLAS_CONNECTION_STRING env variable is not set or is empty.");
        }
        // establish connection and set namespace
        try (MongoClient mongoClient = MongoClients.create(uri)) {
            MongoDatabase database = mongoClient.getDatabase("sample_airbnb");
            MongoCollection<Document> collection = database.getCollection("listingsAndReviews");
            // define the query and get the embedding
            String query = "beach house";
            EmbeddingProvider embeddingProvider = new EmbeddingProvider();
            BsonArray embeddingBsonArray = embeddingProvider.getEmbedding(query);
            List<Double> embedding = new ArrayList<>();
            for (BsonValue value : embeddingBsonArray.stream().toList()) {
                embedding.add(value.asDouble().getValue());
            }
            // define $vectorSearch pipeline
            String indexName = "vector_index";
            FieldSearchPath fieldSearchPath = fieldPath("embeddings");
            int limit = 5;
            List<Bson> pipeline = asList(
                    vectorSearch(
                            fieldSearchPath,
                            embedding,
                            indexName,
                            limit,
                            exactVectorSearchOptions()),
                    project(
                            fields(exclude("_id"), include("summary"),
                                    metaVectorSearchScore("score"))));
            // run query and print results
            List<Document> results = collection.aggregate(pipeline).into(new ArrayList<>());
            if (results.isEmpty()) {
                System.out.println("No results found.");
            } else {
                results.forEach(doc -> {
                    System.out.println("Summary: " + doc.getString("summary"));
                    System.out.println("Score: " + doc.getDouble("score"));
                });
            }
        } catch (MongoException me) {
            throw new RuntimeException("Failed to connect to MongoDB ", me);
        } catch (Exception e) {
            throw new RuntimeException("Operation failed: ", e);
        }
    }
}

ファイルを保存して実行します。出力は、使用したモデルに応じて、次のいずれかのようになります。

Summary: Near to underground metro station. Walking distance to seaside. 2 floors 1 entry. Husband, wife, girl and boy is living.
Score: 0.004518083296716213
Summary: A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.
Score: 0.0044807991944253445
Summary: Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.
Score: 0.004242129623889923
Summary: Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.
Score: 0.004227751865983009
Summary: A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.
Score: 0.004220190457999706

Summary: A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.
Score: 0.4832950830459595
Summary: Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.
Score: 0.48092085123062134
Summary: THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!
Score: 0.4629460275173187
Summary: A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.
Score: 0.4581468403339386
Summary: The Apartment has a living room, toilet, bedroom (suite) and American kitchen. Well located, on the Copacabana beach block a 05 Min. walk from Ipanema beach (Arpoador). Internet wifi, cable tv, air conditioning in the bedroom, ceiling fans in the bedroom and living room, kitchen with microwave, cooker, Blender, dishes, cutlery and service area with fridge, washing machine, clothesline for drying clothes and closet with several utensils for use.  The property boasts 45 m2.
Score: 0.45398443937301636

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

create-index.jsという名前のファイルを作成し、次のコードを貼り付けます。

create-index.js

import { MongoClient } from 'mongodb';
// connect to your Atlas deployment
const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
async function run() {
   try {
     const database = client.db("sample_db");
     const collection = database.collection("embeddings");
    
     // define your Atlas Vector Search index
     const index = {
         name: "vector_index",
         type: "vectorSearch",
         definition: {
           "fields": [
             {
               "type": "vector",
               "path": "embedding",
               "similarity": "dotProduct",
               "numDimensions": <dimensions>
             }
           ]
         }
     }
     // run the helper method
     const result = await collection.createSearchIndex(index);
     console.log(result);
   } finally {
     await client.close();
   }
}
run().catch(console.dir);

オープンソースモデルを使用した場合は <dimensions> プレースホルダー値を 768 に置き換え、OpenAI のモデルを使用した場合は 1536 に置き換えます。
ファイルを保存し、次のコマンドを実行します。
```
node --env-file=.env create-index.js
```

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

vector-query.jsという名前のファイルを作成し、次のコードを貼り付けます。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

ベクトル-クエリ.js

import { MongoClient } from 'mongodb';
import { getEmbedding } from './get-embeddings.js';
// MongoDB connection URI and options
const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
async function run() {
    try {
        // Connect to the MongoDB client
        await client.connect();
        // Specify the database and collection
        const database = client.db("sample_db"); 
        const collection = database.collection("embeddings"); 
        // Generate embedding for the search query
        const queryEmbedding = await getEmbedding("ocean tragedy");
        // Define the sample vector search pipeline
        const pipeline = [
            {
                $vectorSearch: {
                    index: "vector_index",
                    queryVector: queryEmbedding,
                    path: "embedding",
                    exact: true,
                    limit: 5
                }
            },
            {
                $project: {
                    _id: 0,
                    text: 1,
                    score: {
                        $meta: "vectorSearchScore"
                    }
                }
            }
        ];
        // run pipeline
        const result = collection.aggregate(pipeline);
        // print results
        for await (const doc of result) {
            console.dir(JSON.stringify(doc));
        }
        } finally {
        await client.close();
    }
}
run().catch(console.dir);

ファイルを保存し、次のコマンドを実行します。

node --env-file=.env vector-query.js

'{"text":"Titanic: The story of the 1912 sinking of the largest luxury liner ever built","score":0.5103757977485657}'
'{"text":"Avatar: A marine is dispatched to the moon Pandora on a unique mission","score":0.4616812467575073}'
'{"text":"The Lion King: Lion cub and future king Simba searches for his identity","score":0.4115804433822632}'

node --env-file=.env vector-query.js

{"text":"Titanic: The story of the 1912 sinking of the largest luxury liner ever built","score":0.7007871866226196}
{"text":"Avatar: A marine is dispatched to the moon Pandora on a unique mission","score":0.6327334046363831}
{"text":"The Lion King: Lion cub and future king Simba searches for his identity","score":0.5544710159301758}

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

以下の手順を実行して、embedding フィールドをベクトルタイプとして指定し、類似性の測定を euclidean として指定する sample_airbnb.listingsAndReviews コレクションのインデックスを作成します。

create-index.jsという名前のファイルを作成し、次のコードを貼り付けます。

create-index.js

import { MongoClient } from 'mongodb';
// connect to your Atlas deployment
const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
async function run() {
  try {
    const database = client.db("sample_airbnb");
    const collection = database.collection("listingsAndReviews");
   
    // Define your Atlas Vector Search index
    const index = {
        name: "vector_index",
        type: "vectorSearch",
        definition: {
          "fields": [
            {
              "type": "vector",
              "path": "embedding",
              "similarity": "dotProduct",
              "numDimensions": <dimensions>
            }
          ]
        }
    }
    // Call the method to create the index
    const result = await collection.createSearchIndex(index);
    console.log(result);
  } finally {
    await client.close();
  }
}
run().catch(console.dir);

オープンソースモデルを使用した場合は <dimensions> プレースホルダー値を 768 に置き換え、OpenAI のモデルを使用した場合は 1536 に置き換えます。
ファイルを保存し、次のコマンドを実行します。
```
node --env-file=.env create-index.js
```

注意

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

vector-query.jsという名前のファイルを作成し、次のコードを貼り付けます。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

ベクトル-クエリ.js

import { MongoClient } from 'mongodb';
import { getEmbedding } from './get-embeddings.js';
// MongoDB connection URI and options
const client = new MongoClient(process.env.ATLAS_CONNECTION_STRING);
async function run() {
    try {
        // Connect to the MongoDB client
        await client.connect();
        // Specify the database and collection
        const database = client.db("sample_airbnb"); 
        const collection = database.collection("listingsAndReviews"); 
        // Generate embedding for the search query
        const queryEmbedding = await getEmbedding("beach house");
        // Define the sample vector search pipeline
        const pipeline = [
            {
                $vectorSearch: {
                    index: "vector_index",
                    queryVector: queryEmbedding,
                    path: "embedding",
                    exact: true,
                    limit: 5
                }
            },
            {
                $project: {
                    _id: 0,
                    summary: 1,
                    score: {
                        $meta: "vectorSearchScore"
                    }
                }
            }
        ];
        // run pipeline
        const result = collection.aggregate(pipeline);
        // print results
        for await (const doc of result) {
            console.dir(JSON.stringify(doc));
        }
        } finally {
        await client.close();
    }
}
run().catch(console.dir);

ファイルを保存し、次のコマンドを実行します。

node --env-file=.env vector-query.js

'{"summary":"Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.","score":0.5334879159927368}'
'{"summary":"A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.","score":0.5240535736083984}'
'{"summary":"The Apartment has a living room, toilet, bedroom (suite) and American kitchen. Well located, on the Copacabana beach block a 05 Min. walk from Ipanema beach (Arpoador). Internet wifi, cable tv, air conditioning in the bedroom, ceiling fans in the bedroom and living room, kitchen with microwave, cooker, Blender, dishes, cutlery and service area with fridge, washing machine, clothesline for drying clothes and closet with several utensils for use.  The property boasts 45 m2.","score":0.5232879519462585}'
'{"summary":"Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.","score":0.5186381340026855}'
'{"summary":"A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.","score":0.5078228116035461}'

node --env-file=.env vector-query.js

{"summary": "A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.", "score": 0.483333021402359}
{"summary": "Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.", "score": 0.48092877864837646}
{"summary": "THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!", "score": 0.46294474601745605}
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "score": 0.4580020606517792}
{"summary": "The Apartment has a living room, toilet, bedroom (suite) and American kitchen. Well located, on the Copacabana beach block a 05 Min. walk from Ipanema beach (Arpoador). Internet wifi, cable tv, air conditioning in the bedroom, ceiling fans in the bedroom and living room, kitchen with microwave, cooker, Blender, dishes, cutlery and service area with fridge, washing machine, clothesline for drying clothes and closet with several utensils for use.  The property boasts 45 m2.", "score": 0.45400717854499817}

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

次のコードをノートブックに貼り付けます。

このコードは、次の内容を指定するコレクションにインデックスを作成します。

embeddingフィールドをベクトル型フィールドとして。
float32 埋め込みの類似性タイプとしての dotProduct。
768 は、埋め込みの次元数として使用します。

from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "path": "embedding",
        "similarity": "dotProduct",
        "numDimensions": 768
      }
    ]
  },
  name="vector_index",
  type="vectorSearch"
)
collection.create_search_index(model=search_index_model)

コードを実行します。
インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

次のコードをノートブックに貼り付けます。

このコードでは、コレクションにインデックスを作成し、ベクトル型として embeddingフィールドを指定し、類似度関数を dotProduct として、次元数を 1536 として指定します。

from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "path": "embedding",
        "similarity": "dotProduct",
        "numDimensions": 1536
      }
    ]
  },
  name="vector_index",
  type="vectorSearch"
)
collection.create_search_index(model=search_index_model)

コードを実行します。
インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリのための埋め込みを作成し、その後クエリを実行します。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

注意

クエリの完了には時間がかかる可能性があります。

# Generate embedding for the search query
query_embedding = get_embedding("ocean tragedy")
# Sample vector search pipeline
pipeline = [
   {
      "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding,
            "path": "embedding",
            "exact": True,
            "limit": 5
      }
   }, 
   {
      "$project": {
         "_id": 0, 
         "text": 1,
         "score": {
            "$meta": "vectorSearchScore"
         }
      }
   }
]
# Execute the search
results = collection.aggregate(pipeline)
# Print results
for i in results:
   print(i)

{'data': 'Titanic: The story of the 1912 sinking of the largest luxury liner ever built', 'score': 0.7661112546920776}
{'data': 'Avatar: A marine is dispatched to the moon Pandora on a unique mission', 'score': 0.7050272822380066}
{'data': 'The Shawshank Redemption: A banker is sentenced to life in Shawshank State Penitentiary for the murders of his wife and her lover.', 'score': 0.7024770379066467}
{'data': 'Jurassic Park: Scientists clone dinosaurs to populate an island theme park, which soon goes awry.', 'score': 0.7011005282402039}
{'data': 'E.T. the Extra-Terrestrial: A young boy befriends an alien stranded on Earth and helps him return home.', 'score': 0.6877288222312927}

# Generate embedding for the search query
query_embedding = get_embedding("ocean tragedy")
# Sample vector search pipeline
pipeline = [
   {
      "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding,
            "path": "embedding",
            "exact": True,
            "limit": 5
      }
   }, 
   {
      "$project": {
         "_id": 0, 
         "text": 1,
         "score": {
            "$meta": "vectorSearchScore"
         }
      }
   }
]
# Execute the search
results = collection.aggregate(pipeline)
# Print results
for i in results:
   print(i)

{"text":"Titanic: The story of the 1912 sinking of the largest luxury liner ever built","score":0.7007871866226196}
{"text":"Avatar: A marine is dispatched to the moon Pandora on a unique mission","score":0.6327334046363831}
{"text":"The Lion King: Lion cub and future king Simba searches for his identity","score":0.5544710159301758}

Atlas Vector Search インデックスを作成します。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

次のコードをノートブックに貼り付けます。

このコードは、次の内容を指定するコレクションにインデックスを作成します。

embeddingフィールドをベクトル型フィールドとして。
float32 埋め込みの類似性タイプとしての dotProduct。
768 は、埋め込みの次元数として使用します。

from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "path": "embedding",
        "similarity": "dotProduct",
        "numDimensions": 768
      }
    ]
  },
  name="vector_index",
  type="vectorSearch"
)
collection.create_search_index(model=search_index_model)

コードを実行します。
インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

データに対してベクトル検索クエリを有効にするには、コレクションに Atlas Vector Search インデックスを作成する必要があります。

次のコードをノートブックに貼り付けます。

from pymongo.operations import SearchIndexModel
# Create your index model, then create the search index
search_index_model = SearchIndexModel(
  definition = {
    "fields": [
      {
        "type": "vector",
        "path": "embedding",
        "similarity": "dotProduct",
        "numDimensions": 1536
      }
    ]
  },
  name="vector_index",
  type="vectorSearch"
)
collection.create_search_index(model=search_index_model)

コードを実行します。
インデックスの構築には約 1 分かかります。構築中、インデックスは最初の同期状態になります。構築が完了したら、コレクション内のデータのクエリを開始できます。

詳細については、「 Atlas Vector Search インデックスの作成」を参照してください。

ベクトル検索クエリの埋め込みを作成し、クエリを実行します。

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

# Generate embedding for the search query
query_embedding = get_embedding("beach house")
# Sample vector search pipeline
pipeline = [
   {
      "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding,
            "path": "embedding",
            "exact": True,
            "limit": 5
      }
   }, 
   {
      "$project": {
         "_id": 0, 
         "summary": 1,
         "score": {
            "$meta": "vectorSearchScore"
         }
      }
   }
]
# Execute the search
results = collection.aggregate(pipeline)
# Print results
for i in results:
   print(i)

{'summary': 'Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.', 'score': 0.7847104072570801}
{'summary': 'The Apartment has a living room, toilet, bedroom (suite) and American kitchen. Well located, on the Copacabana beach block a 05 Min. walk from Ipanema beach (Arpoador). Internet wifi, cable tv, air conditioning in the bedroom, ceiling fans in the bedroom and living room, kitchen with microwave, cooker, Blender, dishes, cutlery and service area with fridge, washing machine, clothesline for drying clothes and closet with several utensils for use.  The property boasts 45 m2.', 'score': 0.7780507802963257}
{'summary': "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", 'score': 0.7723637223243713}
{'summary': 'Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.', 'score': 0.7665778398513794}
{'summary': 'A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.', 'score': 0.7593404650688171}

ベクトル検索クエリを実行するには、集計パイプラインに渡すクエリベクトルを生成します。

たとえば、このコードでは次の処理が行われます。

定義された埋め込み関数を使用してクエリ埋め込みを作成します。
queryVectorフィールドのこの埋め込みを使用し、クエリへのパスを指定します。
$vectorSearch を使用して ENN 検索を実行します。
検索スコアとの関連性で順位付けされた、セマンティクスで類似したドキュメントを返します。

# Generate embedding for the search query
query_embedding = get_embedding("beach house")
# Sample vector search pipeline
pipeline = [
   {
      "$vectorSearch": {
            "index": "vector_index",
            "queryVector": query_embedding,
            "path": "embedding",
            "exact": True,
            "limit": 5
      }
   }, 
   {
      "$project": {
         "_id": 0, 
         "summary": 1,
         "score": {
            "$meta": "vectorSearchScore"
         }
      }
   }
]
# Execute the search
results = collection.aggregate(pipeline)
# Print results
for i in results:
   print(i)

{"summary": "A friendly apartment block where everyone knows each other and there is a strong communal vibe. Property has a huge backyard with vege garden and skate ramp. 7min walk to the beach and 2min to buses.", "score": 0.483333021402359}
{"summary": "Room 2  Private room in charming recently renovated federation guest house at Coogee Beach. Prices are per room for 2 People only. A queen  and a single bed. Not suitable for group booking All rooms have  TV, desk, wardrobe, beds, unlimited wifi 2 mins from the beach, cafes and transport. This is not a party house but a safe and clean place to stay. Share bathrooms and kitchen... All common areas are cleaned daily.", "score": 0.48092877864837646}
{"summary": "THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!", "score": 0.46294474601745605}
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "score": 0.4580020606517792}
{"summary": "The Apartment has a living room, toilet, bedroom (suite) and American kitchen. Well located, on the Copacabana beach block a 05 Min. walk from Ipanema beach (Arpoador). Internet wifi, cable tv, air conditioning in the bedroom, ceiling fans in the bedroom and living room, kitchen with microwave, cooker, Blender, dishes, cutlery and service area with fridge, washing machine, clothesline for drying clothes and closet with several utensils for use.  The property boasts 45 m2.", "score": 0.45400717854499817}

Considerations

ベクトル埋め込みを作成するときは、次の要素を考慮します。

埋め込みの作成方法の選択

ベクトル埋め込みを作成するには、埋め込みモデルを使用する必要があります。埋め込みモデルは、データを埋め込みに変換するために使用するアルゴリズムです。埋め込みモデルに接続し、ベクトル埋め込みを作成するには、次のいずれかの方法を選択できます。

方式	説明
オープンソースモデルをロードする	独自の埋め込みモデルの API キーがない場合は、アプリケーションからローカルにオープンソースの埋め込みモデルをロードします。
独自モデルの使用	ほとんどの AI プロバイダーは、ベクトル埋め込みを作成するために使用できる独自の埋め込みモデル用のAPIを提供しています。
統合の活用	Atlas Vector Search をオープンソースのフレームワークおよび AI サービスと統合して、オープンソースとプロプライエタリの両方の埋め込みモデルにすばやく接続し、Atlas Vector Search のベクトル埋め込みを生成できます。詳細については、「ベクトル検索と AI テクノロジーの統合」を参照してください。

埋め込みモデルの選択

選択した埋め込みモデルはクエリ結果に影響し、Atlas Vector Search インデックスで指定する次元数を決定します。各モデルには、データとユースケースに応じて異なる利点があります。

一般的な埋め込みモデルのリストについては、Massive Text Embedding Benchmark (MTEB) を参照してください。このリストは、さまざまなオープンソースおよび独自のテキスト埋め込みモデルに関する洞察を提供し、ユースケース、モデルタイプ、および特定のモデルメトリクスでモデルをフィルタリングできます。

Atlas Vector Search の埋め込みモデルを選択するときは、次のメトリクスを考慮してください。

埋め込み次元: ベクトル埋め込みの長さ。
埋め込みが小さいほどストレージ効率が良く、埋め込みが大きいほどデータ内の微細な関係を捉えることができます。選択するモデルは、効率と複雑さのバランスをとる必要があります。
最大トークン数: 単一の埋め込みで圧縮できるトークンの数。
モデルサイズ: モデルのサイズ（ギガバイト単位）。
大きなモデルのパフォーマンスは向上しますが、Atlas Vector Search を本番環境に拡張すると、必要な計算リソースが増えます。
検索平均: 検索システムのパフォーマンスを測定するスコア。
スコアが高いほど、検索された結果のリストの関連するドキュメントをより高く評価することで、モデルがより優れていることを示します。このスコアは、RAG アプリケーションのモデルを選択するときに重要です。

Tip

アプリケーションに適した埋め込みモデルを選択する方法

ベクトル圧縮

多数の浮動小数ベクトルがあり、mongod でストレージとWiredTiger のフットプリント（ディスクやメモリ使用量など）を削減したい場合は、埋め込みを binData ベクトルに変換して圧縮します。

BinData は、バイナリデータを保存するBSONデータ型です。ベクトル埋め込みのデフォルトの型は、32 ビット浮動小数点数（float32）の配列です。バイナリデータは、デフォルトの配列形式よりもストレージ効率が高いため、ディスク容量が 3 倍に少なくなります。

binData ベクトルを保存すると、ワーキングセットにドキュメントをロードするために必要なリソースが少なくなるため、クエリのパフォーマンスが向上します。これにより、20 ドキュメントを返すベクトルクエリのクエリ速度が大幅に向上します。 float32 埋め込みを圧縮する場合は、float32 または binData ベクトルでクエリできます。

このページのチュートリアルには、例関数が含まれており、float32 ベクトルを binData ベクトルに変換するために使用できます。

対応ドライバー

BSON BinData ベクトルは次のドライバーによってサポートされています。

Pymongo ドライバー v4.10 以降
Node.js ドライバー v6.11 以降

バックグラウンド

浮動小数点数ベクトルは通常、配列内の各要素が独自の型を持つため（ほとんどのベクトルが均等に型指定されているにもかかわらず）、圧縮は困難です。このため、埋め込みモデルの浮動ベクトル出力をサブタイプ float32 の binDataベクトルに変換するのは、より効率的な直列化スキームです。 binData ベクトルはベクトル全体の単一の型記述子を保存するため、ストレージのオーバーヘッドを削減します。

埋め込みの検証

埋め込みが正しく最適である状態を確保するために、以下の戦略を検討してください。

ベストプラクティス

埋め込みを作成する際のベストプラクティスを参照してください。

埋め込みの生成とクエリを行う際には、次のベストプラクティスを考慮してください。

関数とスクリプトをテストします 。
埋め込みの生成には時間と計算リソースが必要です。大規模なデータセットまたはコレクションから埋め込みを作成する前に、埋め込み関数またはスクリプトが少数のデータサブセットで期待どおりに動作することをテストしてください。
バッチで埋め込みを作成します 。
大規模なデータセットまたは多数のドキュメントを含むコレクションから埋め込みを生成する場合は、バッチで作成してメモリの問題を回避し、パフォーマンスを最適化します。
パフォーマンス を評価します。
テストクエリを実行して、検索結果が関連性があり、正確にランク付けされているかどうかを確認します。
結果を評価し、インデックスとクエリのパフォーマンスを微調整する方法の詳細については、「クエリ結果の精度を測定し、ベクトル検索のパフォーマンスを向上させる方法」を参照してください。

トラブルシューティング

埋め込みに関する問題をトラブルシューティングするための方法を学びます。

埋め込みで問題が発生した場合は、次の方法を検討してください。

環境を確認します。
必要な依存関係がインストールされ、最新であることを確認します。ライブラリのバージョンが競合すると、予期しない動作が発生する可能性があります。新しい環境を作成し、必要なパッケージのみをインストールして、競合が存在しないことを確認します。
注意
Colab を使用している場合は、ノートブックセッションの IP アドレスが Atlas プロジェクトのアクセスリストに含まれていることを確認します。
メモリ使用量 を監視します。
パフォーマンスの問題が発生した場合は、RAM、CPU、ディスクの使用状況をチェックして、潜在的なボトルネックを特定します。Colab や Jupyter Notebooks のようなホスト環境では、インスタンスに十分なリソースがプロビジョニングされていることを確認し、必要に応じてインスタンスをアップグレードしてください。
次元の一貫性を確保。
Atlas Vector SearchAtlas ベクトル検索インデックスの定義が AtlasAtlas に保存されている埋め込みの次元と一致していること、およびクエリ埋め込みがインデックス付き埋め込みの次元と一致していることを確認します。そうしないと、ベクトル検索クエリを実行中するときにエラーが発生する可能性があります。

特定の問題をトラブルシューティングするには、「トラブルシューティング」を参照してください。

次のステップ

Atlas Vector Search を使用して埋め込みを作成し、埋め込みをクエリする方法を学習したら、検索拡張生成 (RAG)を実装して生成 AI アプリケーションの構築を開始します。

また、リソース消費をさらに削減し、クエリ速度を向上させるために、32 ビットの浮動ベクトル埋め込みをより少ないビットに定量化することもできます。詳細については、ベクトル量子化を参照してください。

戻る

Atlas Vector Search クイックスタート

インデックスの作成と管理

ベクトル埋め込みの作成方法

はじめる

前提条件

埋め込みモデルの使用

.NET プロジェクトを初期化します。

依存関係をインストールしてインポートします。

環境変数を設定します。

ベクトル埋め込みを生成する関数を定義する。

注意

は額文字モデルを呼び出す場合の 503

.NET プロジェクトを初期化します。

依存関係をインストールしてインポートします。

環境変数を設定します。

ベクトル埋め込みを生成する関数を定義する。

注意

Go プロジェクトを初期化します。

依存関係をインストールしてインポートします。

シークレットを管理するには、.envファイルを作成します。

ベクトル埋め込みを生成する関数を定義する。

注意

は額文字モデルを呼び出す場合の 503

Go プロジェクトを初期化します。

依存関係をインストールしてインポートします。

シークレットを管理するには、.envファイルを作成します。

注意

ベクトル埋め込みを生成する関数を定義する。

注意

Javaプロジェクトを作成し、依存関係をインストールします。

環境変数を設定します。

注意

ベクトル埋め込みを生成する方法を定義します。

Javaプロジェクトを作成し、依存関係をインストールします。

環境変数を設定します。

注意

ベクトル埋め込みを生成する方法を定義します。

注意

Node.js プロジェクトを初期化します。

package.jsonファイルを更新します。

依存関係をインストールしてインポートします。

.envファイルを作成します。

注意

Node.js の最低バージョン要件

ベクトル埋め込みを生成する関数を定義する。

（高度）埋め込みを圧縮します。

注意

Node.js プロジェクトを初期化します。

package.jsonファイルを更新します。

依存関係をインストールしてインポートします。

.envファイルを作成します。

注意

Node.js の最低バージョン要件

ベクトル埋め込みを生成する関数を定義する。

（高度）埋め込みを圧縮します。

注意

注意

環境を設定します。

関数を定義してテストし、ベクトル埋め込みを生成します。

（高度）埋め込みを圧縮します。

注意

環境を設定します。

ベクトル埋め込みを生成するための関数を定義してテストします。

（高度）埋め込みを圧縮します。

注意

データから埋め込みを作成

DataService クラスを定義します。

プロジェクト内の Program.cs を更新します。

プロジェクトをコンパイルして実行します 。

注意

DataService クラスを定義します。

プロジェクト内の Program.cs を更新します。

プロジェクトをコンパイルして実行します 。

create-embeddings.goという名前のファイルを作成し、次のコードを貼り付けます。

ファイルを保存して実行します。

注意

create-embeddings.goという名前のファイルを作成し、次のコードを貼り付けます。

コレクションのGoモデルを含むファイルを作成します。

埋め込みを生成する。

Atlas の既存のコレクションから埋め込みを生成するコードを定義します。

埋め込みを生成します。

注意

シークレットを管理するには、`.env`ファイルを作成します。

シークレットを管理するには、`.env`ファイルを作成します。

`package.json`ファイルを更新します。

`.env`ファイルを作成します。

`package.json`ファイルを更新します。

`.env`ファイルを作成します。

`DataService` クラスを定義します。

プロジェクト内の `Program.cs` を更新します。

プロジェクトをコンパイルして実行します。

`DataService` クラスを定義します。

プロジェクト内の `Program.cs` を更新します。

プロジェクトをコンパイルして実行します。

`create-embeddings.go`という名前のファイルを作成し、次のコードを貼り付けます。

`create-embeddings.go`という名前のファイルを作成し、次のコードを貼り付けます。

`create-embeddings.js`という名前のファイルを作成し、次のコードを貼り付けます。

`create-embeddings.js`という名前のファイルを作成し、次のコードを貼り付けます。