Quantização vetorial

Nesta página

Sobre Quantização
Quantização escalar
Quantização binária
Requisitos
Como habilitar a quantização automática de vetores
Benefícios
Casos de uso
Procedimento
Como ingestão de vetores pré-quantizados
Casos de uso
Benefícios
Drivers suportados
Pré-requisitos
Procedimento
Avalie os resultados da sua query

Observação

O suporte do Atlas Vector Search para o seguinte está disponível como um recurso de visualização:

Ingestão de vetores BSON BinData com o subtipo int1.
Quantização escalar automática.
Quantização binária automática.

O Atlas Vector Search suporta a quantização automática de suas incorporações de vetores flutuantes (32-bit e 64-bit). Ele também suporta a ingestão e indexação de seus vetores escalares e binários pré-quantizados de determinados modelos de incorporação.

Sobre Quantização

A quantização é o processo de diminuir vetores de fidelidade total em menos bits. Ele reduz a quantidade de memória principal necessária para armazenar cada vetor em um índice do Atlas Vector Search ao indexar os vetores de representação reduzida. Isso possibilita o armazenamento de mais vetores ou vetores com dimensões superiores. Portanto, a quantização reduz o consumo de recursos e melhora a velocidade. Recomendamos a quantização para aplicativos com um grande número de vetores, como mais de 10M.

Quantização escalar

A quantização escalar envolve primeiro identificar os valores mínimo e máximo para cada dimensão dos vetores indexados para estabelecer uma faixa de valores para uma dimensão. Em seguida, o intervalo é dividido em intervalos ou compartimentos de tamanhos iguais. Finalmente, cada valor de flutuação é mapeado para um compartimento para converter os valores de flutuação contínuos em inteiros discretos. No Atlas Vector Search, essa quantização reduz o custo de RAM da incorporação do vetor para um quarto (1/4) do custo da pré-quantização.

Quantização binária

A quantização binária envolve assumir um ponto médio de 0 para cada dimensão, o que normalmente é apropriado para incorporações normalizadas para comprimento 1, como o text-embedding-3-large da OpenAI. Em seguida, cada valor no vetor é comparado ao ponto médio e recebe um valor binário de 1 se for maior que o ponto médio e um valor binário de 0 se for menor ou igual ao ponto médio. No Atlas Vector Search, esta quantização reduz o custo de RAM da incorporação do vetor para um vigésimo quarto (1/24) do custo pré-quantização. A razão pela qual não 1/32 é é porque a estrutura de dados que contém o gráfico Hierarchical Navigable Small Worlds em si, separada dos valores vetoriais, não é compactada.

Quando você executa uma query, o Atlas Vector Search converte o valor de ponto flutuante no vetor de query em um vetor binário usando o mesmo ponto médio para uma comparação eficiente entre o vetor de query e os vetores binários indexados. Em seguida, ele reavalia os candidatos identificados na comparação binária usando os valores de ponto flutuante originais associados a esses resultados do índice binário para refinar ainda mais os resultados. Os vetores de fidelidade total são armazenados em sua própria estrutura de dados no disco e são referenciados apenas durante o rescoring quando você configura a quantização binária ou realiza uma pesquisa exata em vetores quantizados binários ou escalares.

Veja também:

O que é quantização vetorial?

Requisitos

A tabela a seguir mostra os requisitos para quantizar e inserir automaticamente vetores quantizados.

Observação

O Atlas armazena todos os valores de ponto flutuante como o tipo de dados double internamente; portanto, as incorporações de 32bits e 64bits são compatíveis com a quantização automática sem conversão.

Requerimento	Para `int1` ingestão	Para `int8` ingestão	Para Quantização escalar automática	Para Quantização Binária Automática
Exige configurações de definição de índice	No	No	Sim	Sim
Requer o `binData` formato BSON	Sim	Sim	No	No
Armazenamento no mongod	`binData(int1)`	`binData(int8)`	`binData(float32)` `array(double)`	`binData(float32)` `array(double)`
Método de similaridade suportado	`euclidean`	`cosine` `euclidean` `dotProduct`	`cosine` `euclidean` `dotProduct`	`cosine` `euclidean` `dotProduct`
Número de dimensões suportadas	Múltiplo de 8	1 a 8192	1 a 8192	Múltiplo de 8
Suporta pesquisa ENN	ENN em `int1`	ENN em `int8`	ENN em `float32`	ENN em `float32`

Como habilitar a quantização automática de vetores

Você pode configurar o Atlas Vector Search para quantizar automaticamente as incorporações de vetores flutuantes em sua coleção para tipos de representação reduzidos, como int8 (scalar) e binary em seus índices vetoriais.

Para definir ou alterar o tipo de quantização, especifique um valor de campo quantization de scalar ou binary na sua definição de índice. Isso aciona uma reconstrução de índice, semelhante a qualquer outra alteração na definição de índice. O tipo de quantização especificado aplica-se a todos os vetores indexados e vetores de consulta no momento da consulta.

Para a maioria dos modelos de incorporação, recomendamos a quantização binária com repontuação. Se você quiser usar modelos de dimensão inferior que não sejam QAT, use a quantização escalar porque ela tem menos perda de representação e, portanto, incorre em menos perda de capacidade de representação.

Benefícios

O Atlas Vector Search oferece recursos nativos para quantização escalar, bem como quantização binária com repontuação. A quantização automática aumenta a escalabilidade e a economia de custos de seus aplicativos, reduzindo os recursos computacionais para o processamento eficiente de seus vetores. A quantização automática reduz a RAM para mongot em 3.75x para escalar e em 24x para binário; os valores vetoriais diminuem em 4x e 32x, respectivamente, mas o gráfico Hierarchical Navigable Small Worlds em si não diminui. Isso melhora o desempenho, mesmo no maior volume e escala.

Casos de uso

Recomendamos a quantização automática se você tiver um grande número de vetores de fidelidade total, normalmente mais de 10vetores M. Após a quantização, você indexa vetores de representação reduzida sem comprometer a precisão ao recuperar vetores.

Procedimento

Para habilitar a quantização automática:

Especifique o tipo de quantização que você deseja no seu índice do Atlas Vector Search .

Em um índice novo ou existente do Atlas Vector Search , especifique um dos seguintes tipos de quantização no fields.quantization campo para sua definição de índice:

scalar: para produzir vetores de bytes a partir de vetores de entrada flutuantes.
binary: para produzir vetores bit a partir de vetores de entrada flutuantes.

Se você especificar a quantização automática em dados que não sejam uma array de valores flutuantes, o Atlas Vector Search ignorará silenciosamente esse vetor em vez de indexá-lo, e esses vetores serão ignorados. Como o Atlas armazena valores flutuantes (32-bit e 64-bit) como o tipo double internamente, as incorporações de modelos que geram qualquer precisão funcionarão com a quantização automática.

Crie ou atualize o índice.

O índice deve levar cerca de um minuto para ser criado. Enquanto ele é compilado, o índice está em um estado de sincronização inicial. Quando a construção estiver concluída, você poderá começar a fazer query nos dados em sua coleção.

O tipo de quantização especificado aplica-se a todos os vetores indexados e vetores de consulta no momento da consulta.

Como ingestão de vetores pré-quantizados

O Atlas Vector Search também suporta a ingestão e indexação de vetores quantizados escalares e binários de determinados modelos de incorporação. Se você ainda não tiver vetores quantizados, poderá converter suas incorporações em vetores BSON BinData com subtipo float32, int1 ou int8.

Observação

O suporte do Atlas Vector Search para o seguinte está disponível como um recurso de visualização:

Ingestão de vetores BSON BinData com o subtipo int1.
Quantização escalar automática.
Quantização binária automática.

Casos de uso

Recomendamos a ingestão de vetores quantizados BSON binData para os seguintes casos de uso:

Você precisa indexar a saída do vetor quantizado dos modelos de incorporação.
Você tem um grande número de vetores flutuantes e deseja reduzir o armazenamento e o espaço ocupado pelo WiredTiger (como uso de disco e memória) no mongod.

Benefícios

BinData é um tipo de dados BSON que armazena dados binários. Ele comprime suas incorporações vetoriais e requer cerca de três vezes menos espaço em disco em seu cluster em comparação com as incorporações que usam uma array float32 padrão. Para saber mais, consulte Compressão vetorial.

Este subtipo também permite a você indexar seus vetores com tipos alternativos como int1 ou int8 vetores, reduzindo a memória necessária para construir o índice do Atlas Vector Search para sua coleção. Reduz a RAM de mongot em 3.75x para escalar e em 24x para binário; os valores vetoriais diminuem em 4x e 32x, respectivamente, mas o gráfico Hierarchical Navigable Small Worlds em si não diminui.

Se você ainda não tiver binData vetores, você pode converter suas incorporações para este formato utilizando qualquer driver suportado antes de gravar seus dados em uma coleção. O procedimento a seguir orienta você pelas etapas para converter suas incorporações nos vetores BinData com subtipos float32, int8 e int1.

Drivers suportados

BSON Os vetores BinData com subtipos float32, int1 e int8 são suportados pelos seguintes drivers:

Driver PyMongo v4.10 ou posterior
Driver nó.js v6.11 ou posterior
Java Driver v5.3.1 ou posterior

➤ Use o menu suspenso Selecione sua linguagem para definir o idioma do procedimento nesta página.

Pré-requisitos

Para quantizar seus vetores BSON binData, você deve ter o seguinte:

Um cluster do Atlas executando o MongoDB versão 6.0.11, 7.0.2, ou posterior.
Certifique-se de que seu endereço IP esteja incluído na lista de acessodo seu projeto Atlas.
Acesso a um modelo de incorporação que suporta saída de vetor de bytes.
As saídas dos modelos de incorporação a seguir podem ser usadas para gerar vetores BSON binData com um driver MongoDB suportado.
Provedor de modelo de incorporação
Modelo de incorporação
Cohere
embed-english-v3.0
Nomic
nomic-embed-text-v1.5
Jina
jina-embeddings-v2-base-en
Mixedbread
mxbai-embed-large-v1
A quantização escalar preserva a capacidade de recuperação desses modelos, pois todos eles são treinados para serem conscientes da quantização. Portanto, a degradação do recall para incorporações quantizadas escalares produzidos por esses modelos é mínima, mesmo em dimensões menores, como 384.

Java Development Kit (JDK) versão 8 ou posterior.
Um ambiente para configurar e executar um aplicação Java . Recomendamos que você use um ambiente de desenvolvimento integrado (IDE) como IntelliJ IDEA ou Eclipse IDE para configurar Maven ou Gradle para construir e executar seu projeto.

Um editor de terminal e código para executar seu projeto Node.js.
npm e Node.js instalado.

Um ambiente para executar notebooks Python interativos, como o VS Code ou Colab.

Procedimento

Os exemplos neste procedimento usam dados novos ou existentes e incorporações geradas pelo modelo Cohere embed-english-v3.0. O exemplo para novos dados utiliza strings de texto de amostra, que você pode substituir por seus próprios dados. O exemplo para dados existentes utiliza um subconjunto de documentos sem quaisquer incorporações da coleção listingsAndReviews no banco de dados sample_airbnb, que você pode substituir pelo seu próprio banco de dados e coleção (com ou sem incorporações).

Selecione a aba com base no fato de você desejar quantizar vetores binData para novos dados ou para dados que você já tem no Atlas cluster.

Crie um projeto Java em seu IDE com as dependências configuradas para o MongoDB Java Driver e, em seguida, execute as seguintes etapas no projeto. Para tentar o exemplo, substitua os espaços reservados por valores válidos.

Crie seu projeto Java e instale dependências.

No seu IDE, crie um projeto Java usando Maven ou Gradle.

Adicione as seguintes dependências, dependendo do seu gerenciador de pacotes:

Se você estiver utilizando o Maven, adicione as seguintes dependências à array dependencies no arquivo pom.xml do seu projeto:

pom.xml

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.13.2</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.mongodb</groupId>
        <artifactId>mongodb-driver-sync</artifactId>
        <version>5.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.cohere</groupId>
        <artifactId>cohere-java</artifactId>
        <version>1.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>2.0.16</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>2.0.16</version>
        <scope>test</scope>
    </dependency>
</dependencies>

Se você estiver usando o Gradle, adicione o seguinte à array dependencies no arquivo build.gradle do seu projeto:

build.gradle

dependencies {
    // MongoDB Java Sync Driver v5.3.1 or later
    implementation 'org.mongodb:mongodb-driver-sync:[5.3.1,)'
    // Java library for working with Cohere models
    implementation 'ai.cohere:cohere-java:1.6.0'
    // SLF4J (The Simple Logging Facade for Java)
    testImplementation("org.slf4j:slf4j-simple:2.0.16")
    implementation("org.slf4j:slf4j-api:2.0.16")
}

Execute seu gerenciador de pacote para instalar as dependências em seu projeto.

Defina suas variáveis de ambiente.

Observação

Este exemplo define as variáveis do projeto no IDE. Os aplicativos de produção podem gerenciar variáveis de ambiente por meio de uma configuração de sistema, pipeline CI/CD ou gerenciador de segredos, mas você pode adaptar o código fornecido para se adequar ao seu caso de uso.

No seu IDE, crie um novo modelo de configuração e adicione as seguintes variáveis ao seu projeto:

Se você estiver usando o IntelliJ IDEA, crie um novo modelo de configuração de execução Application, depois adicione suas variáveis como valores separados por ponto e vírgula no campo Environment variables (por exemplo, FOO=123;BAR=456). Aplique as alterações e clique em OK.
Para saber mais, consulte a seção Criar uma configuração de execução/depuração a partir de um modelo da documentação do IntelliJ IDEA.
Se você estiver usando o Eclipse, crie uma nova configuração de inicialização Java Application e, em seguida, adicione cada variável como um novo par de valores-chave na guia Environment. Aplique as alterações e clique em OK.
Para saber mais, consulte a seção Criando uma configuração de inicialização do aplicação Java da documentação do IDE do Eclipse.

Variáveis de ambiente

COHERE_API_KEY=<api-key>
MONGODB_URI=<connection-string>

Atualize os espaços reservados com os seguintes valores:

Substitua o valor do espaço reservado <api-key> por sua chave de API Cohere.
Substitua o <connection-string> valor do espaço reservado pela string de conexão SRVdo seu Atlas cluster.
Sua string de conexão deve usar o seguinte formato:
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

Gere incorporações a partir de seus dados.

Você pode usar um fornecedor de modelo de incorporação para gerar incorporações float, int8 e int1 para seus dados e, em seguida, usar o driver Java do MongoDB para converter sua incorporação de vetor nativo em vetores BSON. O código de amostra a seguir usa a API embed do Cohere para gerar vetores de precisão total.

Crie um novo arquivo denominado GenerateAndConvertEmbeddings.java em seu projeto Java .
```
touch GenerateAndConvertEmbeddings.java
```

Copie e cole o seguinte código no arquivo GenerateAndConvertEmbeddings.java.

Este código faz o seguinte:

Gera as incorporações de vetor float32, int8 e ubinary usando a API embed do Cohere.
Converte as incorporações em vetores BSONbinData usando o driver Java do MongoDB.
Cria um arquivo denominado embeddings.json e salva os dados com incorporações no arquivo para carregar no Atlas.

GenerateAndConvertEmbeddings.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedByTypeResponse;
4 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
5 import com.cohere.api.types.EmbedInputType;
6 import com.cohere.api.types.EmbedResponse;
7 import com.cohere.api.types.EmbeddingType;
8 import java.io.FileOutputStream;
9 import java.io.IOException;
10 import java.util.ArrayList;
11 import java.util.List;
12 import java.util.Objects;
13 import java.util.Optional;
14 import org.bson.BinaryVector;
15 import org.bson.Document;
16 
17 public class GenerateAndConvertEmbeddings {
18 
19     // List of text data to embed
20     private static final List<String> DATA = List.of(
21         "The Great Wall of China is visible from space.",
22         "The Eiffel Tower was completed in Paris in 1889.",
23         "Mount Everest is the highest peak on Earth at 8,848m.",
24         "Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
25         "The Mona Lisa was painted by Leonardo da Vinci."
26     );
27 
28     public static void main(String[] args) {
29         // Cohere API key for authentication
30         String apiKey = System.getenv("COHERE_API_KEY");
31 
32         // Fetch embeddings from the Cohere API
33         EmbedByTypeResponseEmbeddings embeddings = fetchEmbeddingsFromCohere(apiKey);
34         Document bsonEmbeddings = convertEmbeddingsToBson(embeddings);
35 
36         writeEmbeddingsToFile(bsonEmbeddings, "embeddings.json");
37     }
38 
39     // Fetches embeddings based on input data from the Cohere API
40     private static EmbedByTypeResponseEmbeddings fetchEmbeddingsFromCohere(String apiKey) {
41         if (Objects.isNull(apiKey) || apiKey.isEmpty()) {
42             throw new RuntimeException("API key not found. Please set COHERE_API_KEY in your environment.");
43         }
44 
45         Cohere cohere = Cohere.builder().token(apiKey).clientName("embed-example").build();
46 
47         try {
48             EmbedRequest request = EmbedRequest.builder()
49                 .model("embed-english-v3.0")
50                 .inputType(EmbedInputType.SEARCH_DOCUMENT)
51                 .texts(DATA)
52                 .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
53                 .build();
54 
55             EmbedResponse response = cohere.embed(request);
56             Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
57 
58             return optionalEmbeddingsWrapper.orElseThrow().getEmbeddings();
59         } catch (Exception e) {
60             System.err.println("Error fetching embeddings: " + e.getMessage());
61             throw e;
62         }
63     }
64 
65     // Converts embeddings to BSON binary vectors using MongoDB Java Driver
66     private static Document convertEmbeddingsToBson(EmbedByTypeResponseEmbeddings embeddings) {
67         List<List<Double>> floatEmbeddings = embeddings.getFloat().orElseThrow();
68         List<List<Integer>> int8Embeddings = embeddings.getInt8().orElseThrow();
69         List<List<Integer>> ubinaryEmbeddings = embeddings.getUbinary().orElseThrow();
70 
71         List<Document> bsonEmbeddings = new ArrayList<>();
72         for (int i = 0; i < floatEmbeddings.size(); i++) {
73             Document bsonEmbedding = new Document()
74                 .append("text", DATA.get(i))
75                 .append("embeddings_float32", BinaryVector.floatVector(listToFloatArray(floatEmbeddings.get(i))))
76                 .append("embeddings_int8", BinaryVector.int8Vector(listToByteArray(int8Embeddings.get(i))))
77                 .append("embeddings_int1", BinaryVector.packedBitVector(listToByteArray(ubinaryEmbeddings.get(i)), (byte) 0));
78 
79             bsonEmbeddings.add(bsonEmbedding);
80         }
81 
82         return new Document("data", bsonEmbeddings);
83     }
84 
85     // Writes embeddings to JSON file
86     private static void writeEmbeddingsToFile(Document bsonEmbeddings, String fileName) {
87         try (FileOutputStream fos = new FileOutputStream(fileName)) {
88             fos.write(bsonEmbeddings.toJson().getBytes());
89             System.out.println("Embeddings saved to " + fileName);
90         } catch (IOException e) {
91             System.out.println("Error writing embeddings to file: " + e.getMessage());
92         }
93     }
94 
95     // Convert List of Doubles to an array of floats
96     private static float[] listToFloatArray(List<Double> list) {
97         float[] array = new float[list.size()];
98         for (int i = 0; i < list.size(); i++) {
99             array[i] = list.get(i).floatValue();
100         }
101         return array;
102     }
103 
104     // Convert List of Integers to an array of bytes
105     private static byte[] listToByteArray(List<Integer> list) {
106         byte[] array = new byte[list.size()];
107         for (int i = 0; i < list.size(); i++) {
108             array[i] = list.get(i).byteValue();
109         }
110         return array;
111     }
112 }

Substitua o valor do espaço reservado COHERE_API_KEY no código se você não definiu a variável de ambiente e salvou o arquivo.
Compile e execute o arquivo usando sua configuração de execução do aplicação .
Se você estiver usando um terminal, execute os seguintes comandos para compilar e executar seu programa.
javac GenerateAndConvertEmbeddings.java java GenerateAndConvertEmbeddings
BSON embeddings saved to embeddings.json
Verifique as incorporações no arquivo embeddings.json.

Para saber mais sobre como gerar incorporações e converter as incorporações em vetores binData, consulte Como criar incorporações vetoriais.

Faça a ingestão dos dados e crie um índice do Atlas Vector Search .

Você deve fazer o upload dos seus dados e incorporações para uma collection no seu Atlas cluster e criar um índice do Atlas Vector Search nos dados para executar $vectorSearch queries nos dados.

Crie um novo arquivo denominado UploadDataAndCreateIndex.java em seu projeto Java .
```
touch UploadDataAndCreateIndex.java
```

Copie e cole o seguinte código no arquivo UploadDataAndCreateIndex.java.

Este código faz o seguinte:

Carrega os dados no arquivo embeddings.json para seu Atlas cluster.
Cria um índice do Atlas Vector Search nos campos embeddings_float32, embeddings_int8 e embeddings_int1.

UploadDataAndCreateIndex.java

1 import com.mongodb.client.MongoClient;
2 import com.mongodb.client.MongoClients;
3 import com.mongodb.client.MongoCollection;
4 import com.mongodb.client.MongoDatabase;
5 import com.mongodb.client.model.SearchIndexModel;
6 import com.mongodb.client.model.SearchIndexType;
7 import org.bson.Document;
8 import org.bson.conversions.Bson;
9 
10 import java.io.IOException;
11 import java.nio.file.Files;
12 import java.nio.file.Path;
13 import java.util.Collections;
14 import java.util.List;
15 import java.util.concurrent.TimeUnit;
16 import java.util.stream.StreamSupport;
17 
18 public class UploadDataAndCreateIndex {
19 
20     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
21     private static final String DB_NAME = "<DATABASE-NAME>";
22     private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
23     private static final String INDEX_NAME = "<INDEX-NAME>";
24 
25     public static void main(String[] args) {
26         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
27             storeEmbeddings(mongoClient);
28             setupVectorSearchIndex(mongoClient);
29         } catch (IOException | InterruptedException e) {
30             e.printStackTrace();
31         }
32     }
33 
34     public static void storeEmbeddings(MongoClient client) throws IOException {
35         MongoDatabase database = client.getDatabase(DB_NAME);
36         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
37 
38         String fileContent = Files.readString(Path.of("embeddings.json"));
39         List<Document> documents = parseDocuments(fileContent);
40 
41         collection.insertMany(documents);
42         System.out.println("Inserted documents into MongoDB");
43     }
44 
45     private static List<Document> parseDocuments(String jsonContent) throws IOException {
46         Document rootDoc = Document.parse(jsonContent);
47         return rootDoc.getList("data", Document.class);
48     }
49 
50     public static void setupVectorSearchIndex(MongoClient client) throws InterruptedException {
51         MongoDatabase database = client.getDatabase(DB_NAME);
52         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
53         
54         Bson definition = new Document(
55             "fields",
56             List.of(
57                 new Document("type", "vector")
58                     .append("path", "embeddings_float32")
59                     .append("numDimensions", 1024)
60                     .append("similarity", "dotProduct"),
61                 new Document("type", "vector")
62                     .append("path", "embeddings_int8")
63                     .append("numDimensions", 1024)
64                     .append("similarity", "dotProduct"),
65                 new Document("type", "vector")
66                     .append("path", "embeddings_int1")
67                     .append("numDimensions", 1024)
68                     .append("similarity", "euclidean")
69             )
70         );
71         
72         SearchIndexModel indexModel = new SearchIndexModel(
73             INDEX_NAME,
74             definition,
75             SearchIndexType.vectorSearch()
76         );
77         
78         List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
79         System.out.println("Successfully created vector index named: " + result.get(0));
80         System.out.println("It may take up to a minute for the index to leave the BUILDING status and become queryable.");
81         
82         System.out.println("Polling to confirm the index has changed from the BUILDING status.");
83         waitForIndex(collection, INDEX_NAME);
84     }
85 
86     public static <T> boolean waitForIndex(final MongoCollection<T> collection, final String indexName) {
87         long startTime = System.nanoTime();
88         long timeoutNanos = TimeUnit.SECONDS.toNanos(60);
89         while (System.nanoTime() - startTime < timeoutNanos) {
90             Document indexRecord = StreamSupport.stream(collection.listSearchIndexes().spliterator(), false)
91                     .filter(index -> indexName.equals(index.getString("name")))
92                     .findAny().orElse(null);
93             if (indexRecord != null) {
94                 if ("FAILED".equals(indexRecord.getString("status"))) {
95                     throw new RuntimeException("Search index has FAILED status.");
96                 }
97                 if (indexRecord.getBoolean("queryable")) {
98                     System.out.println(indexName + " index is ready to query");
99                     return true;
100                 }
101             }
102             try {
103                 Thread.sleep(100); // busy-wait, avoid in production
104             } catch (InterruptedException e) {
105                 Thread.currentThread().interrupt();
106                 throw new RuntimeException(e);
107             }
108         }
109         return false;
110     }
111 }

Substitua os seguintes valores de espaço reservado no código e salve o arquivo.

`MONGODB_URI`	Sua string de conexão do cluster do Atlas se você não tiver definido a variável de ambiente.
`<DATABASE-NAME>`	Nome do banco de dados em seu Atlas cluster.
`<COLLECTION-NAME>`	Nome da collection para onde você deseja carregar os dados.
`<INDEX-NAME>`	Nome do índice do Atlas Vector Search para a coleção.

Compile e execute o arquivo usando sua configuração de execução do aplicação .

Se você estiver usando um terminal, execute os seguintes comandos para compilar e executar seu programa.

javac UploadDataAndCreateIndex.java
java UploadDataAndCreateIndex

Inserted documents into MongoDB
Successfully created vector index named: <INDEX_NAME>
It may take up to a minute for the index to leave the BUILDING status and become queryable.
Polling to confirm the index has changed from the BUILDING status.
<INDEX_NAME> index is ready to query

Inicie sessão no seu cluster do Atlas e verifique o seguinte:
- Dados no namespace.
- Índice do Atlas Vector Search para a coleção.

Crie e execute uma query em relação à coleção.

Para testar suas incorporações, você pode executar uma query em sua coleção. Utilize um fornecedor de modelo de incorporação para gerar incorporações float, int8 e int1 para seu texto de query. O código de exemplo a seguir usa a API embed do Cohere para gerar vetores de precisão total. Depois de gerar as incorporações, use o driver Java do MongoDB para converter sua incorporação de vetor nativo em vetores BSON e execute a $vectorSearch query na coleção.

Crie um novo arquivo denominado CreateEmbeddingsAndRunQuery.java em seu projeto Java .
```
touch CreateEmbeddingsAndRunQuery.java
```

Copie e cole o seguinte código no arquivo CreateEmbeddingsAndRunQuery.java.

Este código faz o seguinte:

Gera as incorporações de vetor float32, int8 e ubinary usando a API embed do Cohere.
Converte as incorporações em vetores BSONbinData usando o driver Java do MongoDB.
Executa a query em relação à sua collection.

CreateEmbeddingsAndRunQuery.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedResponse;
4 import com.cohere.api.types.EmbedByTypeResponse;
5 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
6 import com.cohere.api.types.EmbeddingType;
7 import com.cohere.api.types.EmbedInputType;
8 import com.mongodb.client.MongoClient;
9 import com.mongodb.client.MongoClients;
10 import com.mongodb.client.MongoCollection;
11 import com.mongodb.client.MongoDatabase;
12 import org.bson.Document;
13 import org.bson.conversions.Bson;
14 import org.bson.BinaryVector;
15 
16 import java.util.ArrayList;
17 import java.util.HashMap;
18 import java.util.List;
19 import java.util.Map;
20 import java.util.Optional;
21 
22 import static com.mongodb.client.model.Aggregates.project;
23 import static com.mongodb.client.model.Aggregates.vectorSearch;
24 import static com.mongodb.client.model.Projections.fields;
25 import static com.mongodb.client.model.Projections.include;
26 import static com.mongodb.client.model.Projections.exclude;
27 import static com.mongodb.client.model.Projections.metaVectorSearchScore;
28 import static com.mongodb.client.model.search.SearchPath.fieldPath;
29 import static com.mongodb.client.model.search.VectorSearchOptions.approximateVectorSearchOptions;
30 import static java.util.Arrays.asList;
31 
32 public class CreateEmbeddingsAndRunQuery {
33     private static final String COHERE_API_KEY = System.getenv("COHERE_API_KEY");
34     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
35     private static final String DB_NAME = "<DATABASE-NAME>";
36     private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
37     private static final String VECTOR_INDEX_NAME = "<INDEX-NAME>";
38     private static final String DATA_FIELD_NAME = "<DATA-FIELD>";
39 
40     public static void main(String[] args) {
41         String queryText = "<QUERY-TEXT>";
42 
43         try {
44             CreateAndRunQuery processor = new CreateAndRunQuery();
45             Map<String, BinaryVector> embeddingsData = processor.generateAndConvertEmbeddings(queryText);
46             processor.runVectorSearchQuery(embeddingsData);
47         } catch (Exception e) {
48             e.printStackTrace();
49         }
50     }
51 
52     // Generate embeddings using Cohere's embed API from the query text
53     public Map<String, BinaryVector> generateAndConvertEmbeddings(String text) throws Exception {
54         if (COHERE_API_KEY == null || COHERE_API_KEY.isEmpty()) {
55             throw new RuntimeException("API key not found. Set COHERE_API_KEY in your environment.");
56         }
57 
58         Cohere cohere = Cohere.builder().token(COHERE_API_KEY).build();
59 
60         EmbedRequest request = EmbedRequest.builder()
61                 .model("embed-english-v3.0")
62                 .inputType(EmbedInputType.SEARCH_QUERY)
63                 .texts(List.of(text))
64                 .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
65                 .build();
66 
67         EmbedResponse response = cohere.embed(request);
68         Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
69         if (optionalEmbeddingsWrapper.isEmpty()) {
70             throw new RuntimeException("No embeddings found in the API response.");
71         }
72 
73         EmbedByTypeResponseEmbeddings embeddings = optionalEmbeddingsWrapper.get().getEmbeddings();
74         return createBinaryVectorEmbeddings(embeddings);
75     }
76 
77     // Convert embeddings to BSON binary vectors using MongoDB Java Driver
78     private static Map<String, BinaryVector> createBinaryVectorEmbeddings(EmbedByTypeResponseEmbeddings embeddings) {
79         Map<String, BinaryVector> binaryVectorEmbeddings = new HashMap<>();
80 
81         // Convert float embeddings
82         List<Double> floatList = embeddings.getFloat().orElseThrow().get(0);
83         if (floatList != null) {
84             float[] floatData = listToFloatArray(floatList);
85             BinaryVector floatVector = BinaryVector.floatVector(floatData);
86             binaryVectorEmbeddings.put("float32", floatVector);
87         }
88 
89         // Convert int8 embeddings
90         List<Integer> int8List = embeddings.getInt8().orElseThrow().get(0);
91         if (int8List != null) {
92             byte[] int8Data = listToByteArray(int8List);
93             BinaryVector int8Vector = BinaryVector.int8Vector(int8Data);
94             binaryVectorEmbeddings.put("int8", int8Vector);
95         }
96 
97         // Convert ubinary embeddings
98         List<Integer> ubinaryList = embeddings.getUbinary().orElseThrow().get(0);
99         if (ubinaryList != null) {
100             byte[] int1Data = listToByteArray(ubinaryList);
101             BinaryVector packedBitsVector = BinaryVector.packedBitVector(int1Data, (byte) 0);
102             binaryVectorEmbeddings.put("int1", packedBitsVector);
103         }
104 
105         return binaryVectorEmbeddings;
106     }
107 
108     // Define and run $vectorSearch query using the embeddings
109     public void runVectorSearchQuery(Map<String, BinaryVector> embeddingsData) {
110         if (MONGODB_URI == null || MONGODB_URI.isEmpty()) {
111             throw new RuntimeException("MongoDB URI not found. Set MONGODB_URI in your environment.");
112         }
113 
114         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
115             MongoDatabase database = mongoClient.getDatabase(DB_NAME);
116             MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
117 
118             for (String path : embeddingsData.keySet()) {
119                 BinaryVector queryVector = embeddingsData.get(path);
120 
121                 List<Bson> pipeline = asList(
122                         vectorSearch(
123                                 fieldPath("embeddings_" + path),
124                                 queryVector,
125                                 VECTOR_INDEX_NAME,
126                                 2,
127                                 approximateVectorSearchOptions(5)
128                         ),
129                         project(
130                                 fields(
131                                         exclude("_id"),
132                                         include(DATA_FIELD_NAME),
133                                         metaVectorSearchScore("vectorSearchScore")
134                                 )
135                         )
136                 );
137 
138                 List<Document> results = collection.aggregate(pipeline).into(new ArrayList<>());
139 
140                 System.out.println("Results from " + path + " embeddings:");
141                 for (Document result : results) {
142                     System.out.println(result.toJson());
143                 }
144             }
145         }
146     }
147 
148     private static float[] listToFloatArray(List<Double> list) {
149         float[] array = new float[list.size()];
150         for (int i = 0; i < list.size(); i++) {
151             array[i] = list.get(i).floatValue();
152         }
153         return array;
154     }
155 
156     private static byte[] listToByteArray(List<Integer> list) {
157         byte[] array = new byte[list.size()];
158         for (int i = 0; i < list.size(); i++) {
159             array[i] = list.get(i).byteValue();
160         }
161         return array;
162     }
163 }

Substitua os seguintes valores de espaço reservado no código e salve o arquivo.

`MONGODB_URI`	Sua string de conexão do cluster do Atlas se você não tiver definido a variável de ambiente.
`COHERE_API_KEY`	Sua chave de API Cohere se não definiu a variável de ambiente.
`<DATABASE-NAME>`	Nome do banco de dados em seu Atlas cluster.
`<COLLECTION-NAME>`	Nome da coleção onde você ingeriu os dados.
`<INDEX-NAME>`	Nome do índice do Atlas Vector Search para a coleção.
`<DATA-FIELD-NAME>`	Nome do campo que contém o texto a partir do qual você gerou as incorporações. Para este exemplo, use `text`.
`<QUERY-TEXT>`	Texto para a query. Para este exemplo, use `science fact`.

Compile e execute o arquivo usando sua configuração de execução do aplicação .

Se você estiver usando um terminal, execute os seguintes comandos para compilar e executar seu programa.

javac CreateEmbeddingsAndRunQuery.java
java CreateEmbeddingsAndRunQuery

Results from int1 embeddings:
{"text": "Mount Everest is the highest peak on Earth at 8,848m.", "score": 0.642578125}
{"text": "The Great Wall of China is visible from space.", "score": 0.61328125}
Results from int8 embeddings:
{"text": "Mount Everest is the highest peak on Earth at 8,848m.", "score": 0.5149773359298706}
{"text": "The Great Wall of China is visible from space.", "score": 0.5146723985671997}
Results from float32 embeddings:
{"text": "Mount Everest is the highest peak on Earth at 8,848m.", "score": 0.6583383083343506}
{"text": "The Great Wall of China is visible from space.", "score": 0.6536108255386353}

Para saber mais sobre como gerar incorporações e converter as incorporações em vetores binData, consulte Como criar incorporações vetoriais.

Crie seu projeto Java e instale dependências.

No seu IDE, crie um projeto Java usando Maven ou Gradle.

Adicione as seguintes dependências, dependendo do seu gerenciador de pacotes:

Se você estiver utilizando o Maven, adicione as seguintes dependências à array dependencies no arquivo pom.xml do seu projeto:

pom.xml

<dependencies>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.13.2</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.mongodb</groupId>
        <artifactId>mongodb-driver-sync</artifactId>
        <version>5.3.1</version>
    </dependency>
    <dependency>
        <groupId>com.cohere</groupId>
        <artifactId>cohere-java</artifactId>
        <version>1.6.0</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-api</artifactId>
        <version>2.0.16</version>
    </dependency>
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>2.0.16</version>
        <scope>test</scope>
    </dependency>
</dependencies>

Se você estiver usando o Gradle, adicione o seguinte à array dependencies no arquivo build.gradle do seu projeto:

build.gradle

dependencies {
    // MongoDB Java Sync Driver v5.3.1 or later
    implementation 'org.mongodb:mongodb-driver-sync:[5.3.1,)'
    // Java library for working with Cohere models
    implementation 'ai.cohere:cohere-java:1.6.0'
    // SLF4J (The Simple Logging Facade for Java)
    testImplementation("org.slf4j:slf4j-simple:2.0.16")
    implementation("org.slf4j:slf4j-api:2.0.16")
}

Execute seu gerenciador de pacote para instalar as dependências em seu projeto.

Defina suas variáveis de ambiente.

Observação

No seu IDE, crie um novo modelo de configuração e adicione as seguintes variáveis ao seu projeto:

Se você estiver usando o IntelliJ IDEA, crie um novo modelo de configuração de execução Application, depois adicione suas variáveis como valores separados por ponto e vírgula no campo Environment variables (por exemplo, FOO=123;BAR=456). Aplique as alterações e clique em OK.
Para saber mais, consulte a seção Criar uma configuração de execução/depuração a partir de um modelo da documentação do IntelliJ IDEA.
Se você estiver usando o Eclipse, crie uma nova configuração de inicialização Java Application e, em seguida, adicione cada variável como um novo par de valores-chave na guia Environment. Aplique as alterações e clique em OK.
Para saber mais, consulte a seção Criando uma configuração de inicialização do aplicação Java da documentação do IDE do Eclipse.

Variáveis de ambiente

COHERE_API_KEY=<api-key>
MONGODB_URI=<connection-string>

Atualize os espaços reservados com os seguintes valores:

Substitua o valor do espaço reservado <api-key> por sua chave de API Cohere.
Substitua o <connection-string> valor do espaço reservado pela string de conexão SRVdo seu Atlas cluster.
Sua string de conexão deve usar o seguinte formato:
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```

(Condicional) Gere incorporações a partir de seus dados.

Você pode usar um fornecedor de modelo de incorporação para gerar incorporações float, int8 e int1 para seus dados e, em seguida, usar o driver Java do MongoDB para converter sua incorporação de vetor nativo em vetores BSON. O código de exemplo a seguir usa a API embed do Cohere para gerar vetores de precisão total a partir dos dados no namespace sample_airbnb.listingsAndReviews.

Crie um novo arquivo denominado GenerateAndConvertEmbeddings.java em seu projeto Java .
```
touch GenerateAndConvertEmbeddings.java
```

Copie e cole o seguinte código no arquivo GenerateAndConvertEmbeddings.java.

Este código faz o seguinte:

Obtém o campo summary de 50 documentos no namespace sample_airbnb.listingsAndReviews .
Gera as incorporações de vetor float32, int8 e ubinary usando a API embed do Cohere.
Converte as incorporações em vetores BSONbinData usando o driver Java do MongoDB.
Cria um arquivo chamado embeddings.json e salva os dados com incorporações no arquivo.

GenerateAndConvertEmbeddings.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedByTypeResponse;
4 import com.cohere.api.types.EmbedResponse;
5 import com.cohere.api.types.EmbeddingType;
6 import com.cohere.api.types.EmbedInputType;
7 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
8 import com.mongodb.client.MongoClient;
9 import com.mongodb.client.MongoClients;
10 import com.mongodb.client.MongoDatabase;
11 import com.mongodb.client.MongoCollection;
12 import com.mongodb.client.FindIterable;
13 import org.bson.BsonArray;
14 import org.bson.Document;
15 import org.bson.BinaryVector;
16 import org.slf4j.Logger;
17 import org.slf4j.LoggerFactory;
18 import java.io.FileOutputStream;
19 import java.io.IOException;
20 import java.util.ArrayList;
21 import java.util.Arrays;
22 import java.util.List;
23 import java.util.Objects;
24 import java.util.Optional;
25 
26 public class GenerateAndConvertEmbeddings {
27     private static final Logger logger = LoggerFactory.getLogger(GenerateAndConvertEmbeddings.class);
28     private static final String COHERE_API_KEY = System.getenv("COHERE_API_KEY");
29     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
30 
31     public static void main(String[] args) {
32         try {
33             List<String> summaries = fetchSummariesFromMongoDB();
34             if (summaries.isEmpty()) {
35                 throw new RuntimeException("No summaries retrieved from MongoDB.");
36             }
37             EmbedByTypeResponseEmbeddings embeddingsData = fetchEmbeddingsFromCohere(COHERE_API_KEY, summaries);
38             if (embeddingsData == null) {
39                 throw new RuntimeException("Failed to fetch embeddings.");
40             }
41             convertAndSaveEmbeddings(summaries, embeddingsData);
42         } catch (Exception e) {
43             logger.error("Unexpected error: {}", e.getMessage(), e);
44         }
45     }
46 
47     private static List<String> fetchSummariesFromMongoDB() {
48         List<String> summaries = new ArrayList<>();
49         if (MONGODB_URI == null || MONGODB_URI.isEmpty()) {
50             throw new RuntimeException("MongoDB URI is not set.");
51         }
52         logger.info("Connecting to MongoDB at URI: {}", MONGODB_URI);
53         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
54             String dbName = "sample_airbnb";
55             String collName = "listingsAndReviews";
56             MongoDatabase database = mongoClient.getDatabase(dbName);
57             MongoCollection<Document> collection = database.getCollection(collName);
58             Document filter = new Document("summary", new Document("$nin", Arrays.asList(null, "")));
59             FindIterable<Document> documentsCursor = collection.find(filter).limit(50);
60             for (Document doc : documentsCursor) {
61                 String summary = doc.getString("summary");
62                 if (summary != null && !summary.isEmpty()) {
63                     summaries.add(summary);
64                 }
65             }
66             logger.info("Retrieved {} summaries from MongoDB.", summaries.size());
67         } catch (Exception e) {
68             logger.error("Error fetching from MongoDB: {}", e.getMessage(), e);
69             throw new RuntimeException("Failed to fetch data from MongoDB", e);
70         }
71         return summaries;
72     }
73 
74     private static EmbedByTypeResponseEmbeddings fetchEmbeddingsFromCohere(String apiKey, List<String> data) {
75         if (Objects.isNull(apiKey) || apiKey.isEmpty()) {
76             throw new RuntimeException("API key is not set.");
77         }
78         Cohere cohere = Cohere.builder().token(apiKey).clientName("embed-example").build();
79         try {
80             EmbedRequest request = EmbedRequest.builder()
81                     .model("embed-english-v3.0")
82                     .inputType(EmbedInputType.SEARCH_DOCUMENT)
83                     .texts(data)
84                     .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
85                     .build();
86             EmbedResponse response = cohere.embed(request);
87             Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
88             if (optionalEmbeddingsWrapper.isPresent()) {
89                 return optionalEmbeddingsWrapper.get().getEmbeddings();
90             } else {
91                 logger.warn("No embeddings were returned.");
92             }
93         } catch (Exception e) {
94             logger.error("Error fetching embeddings: {}", e.getMessage(), e);
95         }
96         return null;
97     }
98 
99     private static void convertAndSaveEmbeddings(List<String> summaries, EmbedByTypeResponseEmbeddings embeddings) {
100         try {
101             Document doc = new Document();
102             BsonArray array = new BsonArray();
103             for (int i = 0; i < summaries.size(); i++) {
104                 String summary = summaries.get(i);
105 
106                 // Retrieve the embeddings for the current index
107                 List<Double> floatList = embeddings.getFloat().orElseThrow().get(i);
108                 List<Integer> int8List = embeddings.getInt8().orElseThrow().get(i);
109                 List<Integer> ubinaryList = embeddings.getUbinary().orElseThrow().get(i);
110 
111                 // Convert lists to arrays
112                 float[] floatData = listToFloatArray(floatList);
113                 byte[] int8Data = listToByteArray(int8List);
114                 byte[] int1Data = listToByteArray(ubinaryList);
115 
116                 // Create BinaryVector objects
117                 BinaryVector floatVector = BinaryVector.floatVector(floatData);
118                 BinaryVector int8Vector = BinaryVector.int8Vector(int8Data);
119                 BinaryVector packedBitsVector = BinaryVector.packedBitVector(int1Data, (byte) 0);
120 
121                 Document document = new Document()
122                         .append("text", summary)
123                         .append("embeddings_float32", floatVector)
124                         .append("embeddings_int8", int8Vector)
125                         .append("embeddings_int1", packedBitsVector);
126                 array.add(document.toBsonDocument());
127             }
128             doc.append("data", array);
129             try (FileOutputStream fos = new FileOutputStream("embeddings.json")) {
130                 fos.write(doc.toJson().getBytes());
131             }
132             logger.info("Embeddings with BSON vectors have been saved to embeddings.json");
133         } catch (IOException e) {
134             logger.error("Error writing embeddings to file: {}", e.getMessage(), e);
135         }
136     }
137 
138     private static float[] listToFloatArray(List<Double> list) {
139         float[] array = new float[list.size()];
140         for (int i = 0; i < list.size(); i++) {
141             array[i] = list.get(i).floatValue();
142         }
143         return array;
144     }
145 
146     private static byte[] listToByteArray(List<Integer> list) {
147         byte[] array = new byte[list.size()];
148         for (int i = 0; i < list.size(); i++) {
149             array[i] = list.get(i).byteValue();
150         }
151         return array;
152     }
153 }

Substitua os seguintes valores de espaço reservado no código se você não definiu as variáveis de ambiente e salvou o arquivo.
MONGODB_URI
Sua string de conexão do cluster do Atlas se você não tiver definido a variável de ambiente.
COHERE_API_KEY
Sua chave de API Cohere se não definiu a variável de ambiente.

Compile e execute o arquivo usando sua configuração de execução do aplicação .

Se você estiver usando um terminal, execute os seguintes comandos para compilar e executar seu programa.

javac GenerateAndConvertEmbeddings.java
java GenerateAndConvertEmbeddings

[main] INFO GenerateAndConvertEmbeddings - Connecting to MongoDB at URI: <CONNECTION-STRING>
...
[main] INFO GenerateAndConvertEmbeddings - Retrieved 50 summaries from MongoDB.
[main] INFO GenerateAndConvertEmbeddings - Embeddings with BSON vectors have been saved to embeddings.json

Verifique as incorporações no arquivo embeddings.json.

Para saber mais sobre como gerar incorporações e converter as incorporações em vetores binData, consulte Como criar incorporações vetoriais.

Faça a ingestão dos dados e crie um índice do Atlas Vector Search .

Crie um novo arquivo denominado UploadDataAndCreateIndex.java em seu projeto Java .
```
touch UploadDataAndCreateIndex.java
```

Copie e cole o seguinte código no arquivo UploadDataAndCreateIndex.java.

Este código faz o seguinte:

Carrega as incorporações float32, int8 e int1 no arquivo embeddings.json para seu cluster do Atlas .
Cria um índice do Atlas Vector Search nos campos embeddings.float32, embeddings.int8 e embeddings.int1.

UploadDataAndCreateIndex.java

1 import com.mongodb.client.MongoClient;
2 import com.mongodb.client.MongoClients;
3 import com.mongodb.client.MongoCollection;
4 import com.mongodb.client.MongoDatabase;
5 import com.mongodb.client.model.SearchIndexModel;
6 import com.mongodb.client.model.SearchIndexType;
7 
8 import org.bson.Document;
9 import org.bson.conversions.Bson;
10 import org.bson.BinaryVector; // Import the BinaryVector
11 
12 import java.io.IOException;
13 import java.nio.file.Files;
14 import java.nio.file.Path;
15 import java.util.Collections;
16 import java.util.List;
17 import java.util.concurrent.TimeUnit;
18 import java.util.stream.StreamSupport;
19 
20 public class UploadDataAndCreateIndex {
21 
22     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
23     private static final String DB_NAME = "sample_airbnb";
24     private static final String COLLECTION_NAME = "listingsAndReviews";
25     private static final String INDEX_NAME = "<INDEX-NAME>";
26 
27     public static void main(String[] args) {
28         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
29             uploadEmbeddingsData(mongoClient);
30             setupVectorSearchIndex(mongoClient);
31         } catch (Exception e) {
32             e.printStackTrace();
33         }
34     }
35 
36     public static void uploadEmbeddingsData(MongoClient mongoClient) throws IOException {
37         MongoDatabase database = mongoClient.getDatabase(DB_NAME);
38         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
39         String filePath = "embeddings.json";
40         String fileContent = Files.readString(Path.of(filePath));
41 
42         Document rootDoc = Document.parse(fileContent);
43         List<Document> embeddingsDocs = rootDoc.getList("data", Document.class);
44 
45         for (Document doc : embeddingsDocs) {
46             // Retrieve the string value from the document
47             String summary = doc.getString("text");
48 
49             // Get the BinaryVector objects from the document
50             BinaryVector embeddingsFloat32 = doc.get("embeddings_float32", BinaryVector.class);
51             BinaryVector embeddingsInt8 = doc.get("embeddings_int8", BinaryVector.class);
52             BinaryVector embeddingsInt1 = doc.get("embeddings_int1", BinaryVector.class);
53 
54             // Create filter and update documents
55             Document filter = new Document("summary", summary);
56             Document update = new Document("$set", new Document("summary", summary)
57                     .append("embeddings_float32", embeddingsFloat32)
58                     .append("embeddings_int8", embeddingsInt8)
59                     .append("embeddings_int1", embeddingsInt1));
60 
61             // Perform update operation with upsert option
62             collection.updateOne(filter, update, new com.mongodb.client.model.UpdateOptions().upsert(true));
63             System.out.println("Processed document with summary: " + summary);
64         }
65     }
66 
67     public static void setupVectorSearchIndex(MongoClient client) throws InterruptedException {
68         MongoDatabase database = client.getDatabase(DB_NAME);
69         MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
70         // Define the index details
71         Bson definition = new Document(
72             "fields",
73             List.of(
74                 new Document("type", "vector")
75                     .append("path", "embeddings_float32")
76                     .append("numDimensions", 1024)
77                     .append("similarity", "dotProduct"),
78                 new Document("type", "vector")
79                     .append("path", "embeddings_int8")
80                     .append("numDimensions", 1024)
81                     .append("similarity", "dotProduct"),
82                 new Document("type", "vector")
83                     .append("path", "embeddings_int1")
84                     .append("numDimensions", 1024)
85                     .append("similarity", "euclidean")
86             )
87         );
88         // Define the index model
89         SearchIndexModel indexModel = new SearchIndexModel(
90             INDEX_NAME,
91             definition,
92             SearchIndexType.vectorSearch()
93         );
94         // Create the index using the defined model
95         List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
96         System.out.println("Successfully created vector index named: " + result.get(0));
97         System.out.println("It may take up to a minute for the index to leave the BUILDING status and become queryable.");
98         // Wait for Atlas to build the index
99         System.out.println("Polling to confirm the index has changed from the BUILDING status.");
100         waitForIndex(collection, INDEX_NAME);
101     }
102 
103     public static <T> boolean waitForIndex(final MongoCollection<T> collection, final String indexName) {
104         long startTime = System.nanoTime();
105         long timeoutNanos = TimeUnit.SECONDS.toNanos(60);
106         while (System.nanoTime() - startTime < timeoutNanos) {
107             Document indexRecord = StreamSupport.stream(collection.listSearchIndexes().spliterator(), false)
108                     .filter(index -> indexName.equals(index.getString("name")))
109                     .findAny().orElse(null);
110             if (indexRecord != null) {
111                 if ("FAILED".equals(indexRecord.getString("status"))) {
112                     throw new RuntimeException("Search index has FAILED status.");
113                 }
114                 if (indexRecord.getBoolean("queryable")) {
115                     System.out.println(indexName + " index is ready to query");
116                     return true;
117                 }
118             }
119             try {
120                 Thread.sleep(100); // busy-wait, avoid in production
121             } catch (InterruptedException e) {
122                 Thread.currentThread().interrupt();
123                 throw new RuntimeException(e);
124             }
125         }
126         return false;
127     }
128 }

Substitua os seguintes valores de espaço reservado no código e salve o arquivo.
MONGODB_URI
Sua string de conexão do cluster do Atlas se você não tiver definido a variável de ambiente.
<INDEX-NAME>
Nome do índice do Atlas Vector Search para a coleção.

Compile e execute o arquivo usando sua configuração de execução do aplicação .

Se você estiver usando um terminal, execute os seguintes comandos para compilar e executar seu programa.

javac UploadDataAndCreateIndex.java
java UploadDataAndCreateIndex

Successfully created vector index named: <INDEX_NAME>
It may take up to a minute for the index to leave the BUILDING status and become queryable.
Polling to confirm the index has changed from the BUILDING status.
<INDEX_NAME> index is ready to query

Inicie sessão no seu cluster do Atlas e verifique o seguinte:
- Dados no namespace.
- Índice do Atlas Vector Search para a coleção.

Crie e execute uma query na coleção.

Crie um novo arquivo denominado CreateEmbeddingsAndRunQuery.java em seu projeto Java .
```
touch CreateEmbeddingsAndRunQuery.java
```

Copie e cole o seguinte código no arquivo CreateEmbeddingsAndRunQuery.java.

Este código faz o seguinte:

Gera as incorporações de vetor float32, int8 e ubinary usando a API embed do Cohere.
Converte as incorporações em vetores BSONbinData usando o driver Java do MongoDB.
Executa a query na sua collection e retorna os resultados.

CreateEmbeddingsAndRunQuery.java

1 import com.cohere.api.Cohere;
2 import com.cohere.api.requests.EmbedRequest;
3 import com.cohere.api.types.EmbedResponse;
4 import com.cohere.api.types.EmbedByTypeResponse;
5 import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
6 import com.cohere.api.types.EmbeddingType;
7 import com.cohere.api.types.EmbedInputType;
8 import com.mongodb.client.MongoClient;
9 import com.mongodb.client.MongoClients;
10 import com.mongodb.client.MongoCollection;
11 import com.mongodb.client.MongoDatabase;
12 import org.bson.Document;
13 import org.bson.conversions.Bson;
14 import org.bson.BinaryVector;
15 
16 import java.util.ArrayList;
17 import java.util.HashMap;
18 import java.util.List;
19 import java.util.Map;
20 import java.util.Optional;
21 
22 import static com.mongodb.client.model.Aggregates.project;
23 import static com.mongodb.client.model.Aggregates.vectorSearch;
24 import static com.mongodb.client.model.Projections.fields;
25 import static com.mongodb.client.model.Projections.include;
26 import static com.mongodb.client.model.Projections.exclude;
27 import static com.mongodb.client.model.Projections.metaVectorSearchScore;
28 import static com.mongodb.client.model.search.SearchPath.fieldPath;
29 import static com.mongodb.client.model.search.VectorSearchOptions.approximateVectorSearchOptions;
30 import static java.util.Arrays.asList;
31 
32 public class CreateEmbeddingsAndRunQuery {
33     private static final String COHERE_API_KEY = System.getenv("COHERE_API_KEY");
34     private static final String MONGODB_URI = System.getenv("MONGODB_URI");
35     private static final String DB_NAME = "<DATABASE-NAME>";
36     private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
37     private static final String VECTOR_INDEX_NAME = "<INDEX-NAME>";
38     private static final String DATA_FIELD_NAME = "<DATA-FIELD>";
39 
40     public static void main(String[] args) {
41         String queryText = "<QUERY-TEXT>";
42 
43         try {
44             CreateAndRunQuery processor = new CreateAndRunQuery();
45             Map<String, BinaryVector> embeddingsData = processor.generateAndConvertEmbeddings(queryText);
46             processor.runVectorSearchQuery(embeddingsData);
47         } catch (Exception e) {
48             e.printStackTrace();
49         }
50     }
51 
52     // Generate embeddings using Cohere's embed API from the query text
53     public Map<String, BinaryVector> generateAndConvertEmbeddings(String text) throws Exception {
54         if (COHERE_API_KEY == null || COHERE_API_KEY.isEmpty()) {
55             throw new RuntimeException("API key not found. Set COHERE_API_KEY in your environment.");
56         }
57 
58         Cohere cohere = Cohere.builder().token(COHERE_API_KEY).build();
59 
60         EmbedRequest request = EmbedRequest.builder()
61                 .model("embed-english-v3.0")
62                 .inputType(EmbedInputType.SEARCH_QUERY)
63                 .texts(List.of(text))
64                 .embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
65                 .build();
66 
67         EmbedResponse response = cohere.embed(request);
68         Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
69         if (optionalEmbeddingsWrapper.isEmpty()) {
70             throw new RuntimeException("No embeddings found in the API response.");
71         }
72 
73         EmbedByTypeResponseEmbeddings embeddings = optionalEmbeddingsWrapper.get().getEmbeddings();
74         return createBinaryVectorEmbeddings(embeddings);
75     }
76 
77     // Convert embeddings to BSON binary vectors using MongoDB Java Driver
78     private static Map<String, BinaryVector> createBinaryVectorEmbeddings(EmbedByTypeResponseEmbeddings embeddings) {
79         Map<String, BinaryVector> binaryVectorEmbeddings = new HashMap<>();
80 
81         // Convert float embeddings
82         List<Double> floatList = embeddings.getFloat().orElseThrow().get(0);
83         if (floatList != null) {
84             float[] floatData = listToFloatArray(floatList);
85             BinaryVector floatVector = BinaryVector.floatVector(floatData);
86             binaryVectorEmbeddings.put("float32", floatVector);
87         }
88 
89         // Convert int8 embeddings
90         List<Integer> int8List = embeddings.getInt8().orElseThrow().get(0);
91         if (int8List != null) {
92             byte[] int8Data = listToByteArray(int8List);
93             BinaryVector int8Vector = BinaryVector.int8Vector(int8Data);
94             binaryVectorEmbeddings.put("int8", int8Vector);
95         }
96 
97         // Convert ubinary embeddings
98         List<Integer> ubinaryList = embeddings.getUbinary().orElseThrow().get(0);
99         if (ubinaryList != null) {
100             byte[] int1Data = listToByteArray(ubinaryList);
101             BinaryVector packedBitsVector = BinaryVector.packedBitVector(int1Data, (byte) 0);
102             binaryVectorEmbeddings.put("int1", packedBitsVector);
103         }
104 
105         return binaryVectorEmbeddings;
106     }
107 
108     // Define and run $vectorSearch query using the embeddings
109     public void runVectorSearchQuery(Map<String, BinaryVector> embeddingsData) {
110         if (MONGODB_URI == null || MONGODB_URI.isEmpty()) {
111             throw new RuntimeException("MongoDB URI not found. Set MONGODB_URI in your environment.");
112         }
113 
114         try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
115             MongoDatabase database = mongoClient.getDatabase(DB_NAME);
116             MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
117 
118             for (String path : embeddingsData.keySet()) {
119                 BinaryVector queryVector = embeddingsData.get(path);
120 
121                 List<Bson> pipeline = asList(
122                         vectorSearch(
123                                 fieldPath("embeddings_" + path),
124                                 queryVector,
125                                 VECTOR_INDEX_NAME,
126                                 2,
127                                 approximateVectorSearchOptions(5)
128                         ),
129                         project(
130                                 fields(
131                                         exclude("_id"),
132                                         include(DATA_FIELD_NAME),
133                                         metaVectorSearchScore("vectorSearchScore")
134                                 )
135                         )
136                 );
137 
138                 List<Document> results = collection.aggregate(pipeline).into(new ArrayList<>());
139 
140                 System.out.println("Results from " + path + " embeddings:");
141                 for (Document result : results) {
142                     System.out.println(result.toJson());
143                 }
144             }
145         }
146     }
147 
148     private static float[] listToFloatArray(List<Double> list) {
149         float[] array = new float[list.size()];
150         for (int i = 0; i < list.size(); i++) {
151             array[i] = list.get(i).floatValue();
152         }
153         return array;
154     }
155 
156     private static byte[] listToByteArray(List<Integer> list) {
157         byte[] array = new byte[list.size()];
158         for (int i = 0; i < list.size(); i++) {
159             array[i] = list.get(i).byteValue();
160         }
161         return array;
162     }
163 }

Substitua os seguintes valores de espaço reservado no código e salve o arquivo.

`MONGODB_URI`	Sua string de conexão do cluster do Atlas se você não tiver definido a variável de ambiente.
`COHERE_API_KEY`	Sua chave de API Cohere se não definiu a variável de ambiente.
`<DATABASE-NAME>`	Nome do banco de dados em seu Atlas cluster. Para este exemplo, use `sample_airbnb`.
`<COLLECTION-NAME>`	Nome da coleção onde você ingeriu os dados. Para este exemplo, use `listingsAndReviews`.
`<INDEX-NAME>`	Nome do índice do Atlas Vector Search para a coleção.
`<DATA-FIELD-NAME>`	Nome do campo que contém o texto a partir do qual você gerou as incorporações. Para este exemplo, use `summary`.
`<QUERY-TEXT>`	Texto para a query. Para este exemplo, use `ocean view`.

Compile e execute o arquivo usando sua configuração de execução do aplicação .

Se você estiver usando um terminal, execute os seguintes comandos para compilar e executar seu programa.

javac CreateEmbeddingsAndRunQuery.java
java CreateEmbeddingsAndRunQuery

Results from int1 embeddings:
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "vectorSearchScore": 0.6591796875}
{"summary": "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.", "vectorSearchScore": 0.6337890625}
Results from int8 embeddings:
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "vectorSearchScore": 0.5215557217597961}
{"summary": "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.", "vectorSearchScore": 0.5179016590118408}
Results from float32 embeddings:
{"summary": "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.", "vectorSearchScore": 0.7278661131858826}
{"summary": "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.", "vectorSearchScore": 0.688639760017395}

Para saber mais sobre como gerar incorporações e converter as incorporações em vetores binData, consulte Como criar incorporações vetoriais.

Instale as bibliotecas necessárias.

Execute o seguinte comando para instalar o driver Node.js do MongoDB Node.js Driver. Esta operação pode levar alguns minutos para ser concluída.

npm install mongodb

Você deve instalar o driver Node.js v6.11 ou posterior. Se necessário, você também pode instalar bibliotecas do seu provedor de modelo de incorporação. Por exemplo, para gerar incorporações float32, int8 e int1 usando o Cohere conforme demonstrado nesta página, instale o Cohere:

npm install cohere-ai dotenv
npm show cohere-ai version

Configure as variáveis de ambiente no seu terminal.

Para acessar o provedor de modelo de incorporação para gerar e converter incorporações, configure a variável de ambiente para a chave de API do provedor de modelo de incorporação, se necessário.
Para usar incorporações do Cohere, defina a variável de ambiente COHERE_API_KEY.
```
export COHERE_API_KEY="<COHERE-API-KEY>"
```
Caso não defina a variável de ambiente, substitua o <COHERE-API-KEY> no código de exemplo pela chave de API antes de executar o código.
Para acessar o cluster Atlas, defina a variável de ambiente MONGODB_URI.
```
export MONGODB_URI="<CONNECTION-STRING>"
```
Sua string de conexão deve estar no seguinte formato:
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```
Caso você não configure a variável de ambiente, substitua o <CONNECTION-STRING> no código de exemplo pela sua string de conexão antes de executar o código.

Gere as incorporações vetoriais para seus dados.

Crie um arquivo chamado get-embeddings.js para gerar incorporações de vetores float32, int8 e int1 usando a API embed do Cohere.
```
touch get-embeddings.js
```

Copie e cole o seguinte código no arquivo get-embeddings.js.

Este código faz o seguinte:

Gera incorporações float32, int8 e int1 para os dados fornecidos usando o modelo de incorporação embed-english-v3.0 da Cohere.
Armazena as incorporações float, int8 e int1 em campos chamados float, int8 e ubinary, respectivamente.
Cria um arquivo chamado embeddings.json e salva as incorporações no arquivo.

get-embeddings.js

1 // Use 'require' for modules in a Node.js environment
2 const { CohereClient } = require('cohere-ai');
3 const { writeFile } = require('fs/promises');
4 dd:queueMicrotask
5 // Retrieve API key from environment variables or default placeholder
6 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
7 
8 if (!apiKey) {
9   throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10 }
11 
12 // Instantiate the CohereClient with the API key
13 const cohere = new CohereClient({ token: apiKey });
14 
15 async function main() {
16   try {
17     // Data to embed
18     const data = [
19       "The Great Wall of China is visible from space.",
20       "The Eiffel Tower was completed in Paris in 1889.",
21       "Mount Everest is the highest peak on Earth at 8,848m.",
22       "Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
23       "The Mona Lisa was painted by Leonardo da Vinci.",
24     ];
25 
26     // Fetch embeddings for the data using the cohere API
27     const response = await cohere.v2.embed({
28       model: 'embed-english-v3.0',
29       inputType: 'search_document', 
30       texts: data,
31       embeddingTypes: ['float', 'int8', 'ubinary'], 
32     });
33 
34     // Extract embeddings from the API response
35     const { float, int8, ubinary } = response.embeddings;
36 
37     // Map the embeddings to the text data
38     const embeddingsData = data.map((text, index) => ({
39       text,
40       embeddings: {
41         float: float[index],
42         int8: int8[index],
43         ubinary: ubinary[index],
44       },
45     }));
46 
47     // Write the embeddings data to a JSON file
48     await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
49     console.log('Embeddings saved to embeddings.json');
50   } catch (error) {
51     console.error('Error fetching embeddings:', error);
52   }
53 }
54 
55 // Execute the main function
56 main();

Substitua o espaço reservado <COHERE_API_KEY> se você não configurou sua chave de API para Cohere como uma variável de ambiente e, em seguida, salve o arquivo.
Execute o código para gerar incorporações.
node get-embeddings.js
Embeddings saved to embeddings.json
Verifique as incorporações geradas no arquivo embeddings.json gerado.

Converta as incorporações vetoriais em vetores `binData`.

Crie um arquivo chamado convert-embeddings.js para converter as incorporações de vetores float32, int8 e int1 do Cohere em vetores BSON binData usando o driver MongoDB Node.js.
```
touch convert-embeddings.js
```

Copie e cole o seguinte código no arquivo convert-embeddings.js.

Este código faz o seguinte:

Gera vetores BSON binData para as incorporações float32, int8 e int1.
Anexa os vetores float32, int8 e ubinary BSON binData ao arquivo embeddings.json.

convert-embeddings.js

1 const fs = require('fs/promises');
2 const { BSON } = require('mongodb');
3 const { Binary } = BSON;
4 
5 async function main() {
6   try {
7     // Read and parse the contents of 'embeddings.json' file
8     const fileContent = await fs.readFile('embeddings.json', 'utf8');
9     const embeddingsData = JSON.parse(fileContent);
10 
11     // Map the embeddings data to add BSON binary representations with subtype 9
12     const convertEmbeddingsData = embeddingsData.map(({ text, embeddings }) => {
13       // Create Binary for Float32Array with manual subtype 9
14       const bsonFloat32 = Binary.fromFloat32Array(new Float32Array(embeddings.float));
15 
16       // Create Binary for Int8Array with subtype 9
17       const bsonInt8 = Binary.fromInt8Array(new Int8Array(embeddings.int8));
18 
19       // Create Binary for PackedBits (Uint8Array) with subtype 9
20       const bsonPackedBits = Binary.fromPackedBits(new Uint8Array(embeddings.ubinary));
21 
22       return {
23         text,
24         embeddings: {
25           float: embeddings.float, // Original float data
26           int8: embeddings.int8, // Original int8 data
27           ubinary: embeddings.ubinary, // Original packed bits data
28         },
29         bsonEmbeddings: {
30           float32: bsonFloat32,
31           int8: bsonInt8,
32           packedBits: bsonPackedBits,
33         },
34       };
35     });
36 
37     // Serialize the updated data to EJSON for BSON compatibility
38     const ejsonSerializedData = BSON.EJSON.stringify(convertEmbeddingsData, null, null, { relaxed: false });
39 
40     // Write the serialized data to 'embeddings.json'
41     await fs.writeFile('embeddings.json', ejsonSerializedData);
42     console.log('Embeddings with BSON vectors have been saved to embeddings.json');
43   } catch (error) {
44     console.error('Error processing embeddings:', error);
45   }
46 }
47 
48 main();

Execute o programa para gerar os vetores BSON binData.
node convert-embeddings.js
Embeddings with BSON vectors have been saved to embeddings.json
Verifique as incorporações BSON geradas no arquivo embeddings.json.

Conecte-se ao cluster do Atlas e carregue os dados em uma coleção.

Crie um arquivo chamado upload-data.js para se conectar ao cluster Atlas e criar uma coleção em um banco de dados para os dados no arquivo embeddings.json.
```
touch upload-data.js
```

Copie e cole o seguinte código no arquivo upload-data.js.

Este código faz o seguinte:

Conecta-se ao seu cluster do Atlas e cria um namespace com o nome do banco de dados e da coleção que você especificar.
Carrega os dados, incluindo as incorporações no arquivo embeddings.json, para o namespace especificado.

upload-data.js

1 const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2 const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3 
4 const { Binary } = BSON; // Ensure the Binary class is imported correctly
5 
6 async function main() {
7     const MONGODB_URI = process.env.MONGODB_URI || "<CONNECTION-STRING>";
8     const DB_NAME = "<DB-NAME>";
9     const COLLECTION_NAME = "<COLLECTION-NAME>";
10 
11     let client;
12     try {
13         client = new MongoClient(MONGODB_URI);
14         await client.connect();
15         console.log("Connected to MongoDB");
16 
17         const db = client.db(DB_NAME);
18         const collection = db.collection(COLLECTION_NAME);
19 
20         // Read and parse the contents of 'embeddings.json' file using EJSON
21         const fileContent = await fs.readFile('embeddings.json', 'utf8');
22         const embeddingsData = BSON.EJSON.parse(fileContent);
23 
24         // Map embeddings data to recreate BSON binary representations with the correct subtype
25         const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
26             return {
27                 text,
28                 bsonEmbeddings: {
29                     float32: bsonEmbeddings.float32,
30                     int8: bsonEmbeddings.int8,
31                     int1: bsonEmbeddings.packedBits
32                 }
33             };
34         });
35 
36         const result = await collection.insertMany(documents);
37         console.log(`Inserted ${result.insertedCount} documents into MongoDB`);
38 
39     } catch (error) {
40         console.error('Error storing embeddings in MongoDB:', error);
41     } finally {
42         if (client) {
43             await client.close();
44         }
45     }
46 }
47 
48 // Run the store function
49 main();

Substitua as seguintes configurações e salve o arquivo.

`<CONNECTION-STRING>`	String de conexão para conectar ao cluster do Atlas onde você deseja criar o banco de dados e a coleção. Substitua este valor apenas se você não tiver configurado a variável de ambiente `MONGODB_URI`.
`<DB-NAME>`	Nome do banco de dados onde você deseja criar a coleção.
`<COLLECTION-NAME>`	Nome da coleção onde você deseja armazenar as incorporações geradas.

Execute o seguinte comando para carregar os dados.
```
node upload-data.js
```
Verifique se os documentos existem na coleção em seu cluster Atlas.

Crie o índice do Atlas Vector Search na coleção.

Crie um arquivo chamado create-index.js para definir um índice do Atlas Vector Search na coleção.
```
touch create-index.js
```

Copie e cole o seguinte código para criar o índice no arquivo create-index.js.

O código faz o seguinte:

Conecta-se ao cluster do Atlas e cria um índice com o nome especificado para o namespace especificado.
Indexa os campos bsonEmbeddings.float32 e bsonEmbeddings.int8 como tipo vector que usa a função de similaridade dotProduct, e o campo bsonEmbeddings.int1 também como tipo vector que usa a função euclidean.

create-index.js

1 const { MongoClient } = require("mongodb");
2 const { setTimeout } = require("timers/promises"); // Import from timers/promises
3 
4 // Connect to your Atlas deployment
5 const uri = process.env.MONGODB_URI || "<CONNECTION-STRING>";
6 
7 const client = new MongoClient(uri);
8 
9 async function main() {
10   try {
11     const database = client.db("<DB-NAME>");
12     const collection = database.collection("<COLLECTION-NAME>");
13 
14     // Define your Atlas Vector Search index
15     const index = {
16       name: "<INDEX-NAME>",
17       type: "vectorSearch",
18       definition: {
19         fields: [
20           {
21             type: "vector",
22             numDimensions: 1024,
23             path: "bsonEmbeddings.float32",
24             similarity: "dotProduct",
25           },
26           {
27             type: "vector",
28             numDimensions: 1024,
29             path: "bsonEmbeddings.int8",
30             similarity: "dotProduct",
31           },
32           {
33             type: "vector",
34             numDimensions: 1024,
35             path: "bsonEmbeddings.int1",
36             similarity: "euclidean",
37           },
38         ],
39       },
40     };
41 
42     // Run the helper method
43     const result = await collection.createSearchIndex(index);
44     console.log(`New search index named ${result} is building.`);
45 
46     // Wait for the index to be ready to query
47     console.log("Polling to check if the index is ready. This may take up to a minute.");
48     let isQueryable = false;
49 
50     // Use filtered search for index readiness
51     while (!isQueryable) {
52       const [indexData] = await collection.listSearchIndexes(index.name).toArray();
53 
54       if (indexData) {
55         isQueryable = indexData.queryable;
56         if (!isQueryable) {
57           await setTimeout(5000); // Wait for 5 seconds before checking again
58         }
59       } else {
60         // Handle the case where the index might not be found
61         console.log(`Index ${index.name} not found.`);
62         await setTimeout(5000); // Wait for 5 seconds before checking again
63       }
64     }
65 
66     console.log(`${result} is ready for querying.`);
67   } catch (error) {
68     console.error("Error:", error);
69   } finally {
70     await client.close();
71   }
72 }
73 
74 main().catch((err) => {
75   console.error("Unhandled error:", err);
76 });

Substitua as seguintes configurações e salve o arquivo.

`<CONNECTION-STRING>`	String de conexão para se conectar ao cluster do Atlas onde você deseja criar o índice. Substitua este valor apenas se você não tiver configurado a variável de ambiente `MONGODB_URI`.
`<DB-NAME>`	Nome do banco de dados onde você deseja criar a coleção.
`<COLLECTION-NAME>`	Nome da coleção onde você deseja armazenar as incorporações geradas.
`<INDEX-NAME>`	Nome do índice para a coleção.

Execute o seguinte comando para criar o índice.
```
node create-index.js
```

Gerar as incorporações para o texto da consulta.

Crie um arquivo denominado get-query-embedding.js.
```
touch get-query-embeddings.js
```

Copie e cole o código no arquivo get-query-embedding.js.

O código de exemplo faz o seguinte:

Gera incorporações float32, int8 e int1 para o texto da consulta usando o Cohere.
Converte as incorporações geradas em binData vetores BSON usando o PyMongo.
Salva as incorporações geradas em um arquivo chamado query-embeddings.json.

get-query-embedding.js

1 const { CohereClient } = require('cohere-ai');
2 const { BSON } = require('mongodb');
3 const { writeFile } = require('fs/promises');
4 const dotenv = require('dotenv');
5 const process = require('process');
6 
7 // Load environment variables
8 dotenv.config();
9 
10 const { Binary } = BSON;
11 
12 // Get the API key from environment variables or set the key here
13 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
14 
15 if (!apiKey) {
16   throw new Error('API key not found. Provide the COHERE_API_KEY.');
17 }
18 
19 // Initialize CohereClient
20 const cohere = new CohereClient({ token: apiKey });
21 
22 async function main(queryText) {
23   try {
24     if (typeof queryText !== 'string' || queryText.trim() === '') {
25       throw new Error('Invalid query text. It must be a non-empty string.');
26     }
27 
28     const data = [queryText];
29 
30     // Request embeddings from the Cohere API
31     const response = await cohere.v2.embed({
32       model: 'embed-english-v3.0',
33       inputType: 'search_query',
34       texts: data,
35       embeddingTypes: ['float', 'int8', 'ubinary'], // Request all required embedding types
36     });
37 
38     if (!response.embeddings) {
39       throw new Error('Embeddings not found in the API response.');
40     }
41 
42     const { float, int8, ubinary } = response.embeddings;
43 
44     const updatedEmbeddingsData = data.map((text, index) => {
45       // Create the BSON Binary objects using VECTOR_TYPE for all embedding types
46       const float32Binary = Binary.fromFloat32Array(new Float32Array(float[index])); // VECTOR_TYPE.FLOAT32
47       const int8Binary = Binary.fromInt8Array(new Int8Array(int8[index])); // VECTOR_TYPE.INT8
48       const packedBitsBinary = Binary.fromPackedBits(new Uint8Array(ubinary[index])); // VECTOR_TYPE.PACKED_BIT
49 
50       return {
51         text,
52         embeddings: {
53           float: float[index],
54           int8: int8[index],
55           ubinary: ubinary[index],
56         },
57         bsonEmbeddings: {
58           float32: float32Binary,
59           int8: int8Binary,
60           int1: packedBitsBinary,
61         },
62       };
63     });
64 
65     // Serialize the embeddings using BSON EJSON for BSON compatibility
66     const outputFileName = 'query-embeddings.json';
67     const ejsonSerializedData = BSON.EJSON.stringify(updatedEmbeddingsData, null, null, { relaxed: false });
68     await writeFile(outputFileName, ejsonSerializedData);
69     console.log(`Embeddings with BSON data have been saved to ${outputFileName}`);
70   } catch (error) {
71     console.error('Error processing query text:', error);
72   }
73 }
74 
75 // Main function that takes a query string
76 (async () => {
77   const queryText = "<QUERY-TEXT>"; // Replace with your actual query text
78   await main(queryText);
79 })();

Substitua as seguintes configurações e salve o arquivo.

`<COHERE-API-KEY>`	Sua chave de API para o Cohere. Substitua este valor apenas se você não tiver configurado a variável de ambiente.
`<QUERY-TEXT>`	Seu texto de consulta. Para este tutorial, utilize `science fact`.

Execute o código para gerar as incorporações para o texto da query.
node get-query-embeddings.js
Embeddings with BSON vectors have been saved to query-embeddings.json

Execute uma consulta do Atlas Vector Search.

Crie um arquivo denominado run-query.js.
```
touch run-query.js
```

Copie e cole a consulta de amostra $vectorSearch a seguir no arquivo run-query.js.

A query de amostra faz o seguinte:

Conecta-se ao seu cluster do Atlas e executa a consulta $vectorSearch nos campos bsonEmbeddings.float32, bsonEmbeddings.int8 e bsonEmbeddings.int1 na coleção especificada usando as incorporações no arquivo query-embeddings.json.
Imprime os resultados das incorporações Float32, Int8 e Binário Empacotado (Int1) no console.

run-query.js

1 const { MongoClient } = require('mongodb');
2 const fs = require('fs/promises');
3 const { BSON } = require('bson'); // Use BSON's functionality for EJSON parsing
4 const dotenv = require('dotenv');
5 
6 dotenv.config();
7 
8 // MongoDB connection details
9 const mongoUri = process.env.MONGODB_URI || '<CONNECTION-STRING>';
10 const dbName = '<DB-NAME>'; // Update with your actual database name
11 const collectionName = '<COLLECTION-NAME>'; // Update with your actual collection name
12 
13 // Indices and paths should match your MongoDB vector search configuration
14 const VECTOR_INDEX_NAME = '<INDEX-NAME>'; // Replace with your actual index name
15 const NUM_CANDIDATES = 5; // Number of candidate documents for the search
16 const LIMIT = 2; // Limit for the number of documents to return
17 
18 // Fields in the collection that contain the BSON query vectors
19 const FIELDS = [
20   { path: 'float32', subtype: 9 }, // Ensure that the path and custom subtype match
21   { path: 'int8', subtype: 9 },    // Use the custom subtype if needed
22   { path: 'int1', subtype: 9 } // Use the same custom subtype
23 ];
24 
25 
26 // Function to read BSON vectors from JSON and run vector search
27 async function main() {
28   // Initialize MongoDB client
29   const client = new MongoClient(mongoUri);
30 
31   try {
32     await client.connect();
33     console.log("Connected to MongoDB");
34 
35     const db = client.db(dbName);
36     const collection = db.collection(collectionName);
37 
38     // Load query embeddings from JSON file using EJSON parsing
39     const fileContent = await fs.readFile('query-embeddings.json', 'utf8');
40     const embeddingsData = BSON.EJSON.parse(fileContent);
41 
42     // Define and run the query for each embedding type
43     const results = {};
44 
45     for (const fieldInfo of FIELDS) {
46       const { path, subtype } = fieldInfo;
47       const bsonBinary = embeddingsData[0]?.bsonEmbeddings?.[path];
48       
49       if (!bsonBinary) {
50         console.warn(`BSON embedding for ${path} not found in the JSON.`);
51         continue;
52       }
53 
54       const bsonQueryVector = bsonBinary; // Directly use BSON Binary object
55 
56       const pipeline = [
57         {
58           $vectorSearch: {
59             index: VECTOR_INDEX_NAME,
60             path: `bsonEmbeddings.${path}`,
61             queryVector: bsonQueryVector,
62             numCandidates: NUM_CANDIDATES,
63             limit: LIMIT,
64           }
65         },
66         {
67           $project: {
68             _id: 0,
69             text: 1, // Adjust projection fields as necessary to match your document structure
70             score: { $meta: 'vectorSearchScore' }
71           }
72         }
73       ];
74 
75       results[path] = await collection.aggregate(pipeline).toArray();
76     }
77 
78     return results;
79   } catch (error) {
80     console.error('Error during vector search:', error);
81   } finally {
82     await client.close();
83   }
84 }
85 
86 // Main execution block
87 (async () => {
88   try {
89     const results = await main();
90 
91     if (results) {
92       console.log("Results from Float32 embeddings:");
93       console.table(results.float32 || []);
94       console.log("--------------------------------------------------------------------------");
95 
96       console.log("Results from Int8 embeddings:");
97       console.table(results.int8 || []);
98       console.log("--------------------------------------------------------------------------");
99 
100       console.log("Results from Packed Binary (PackedBits) embeddings:");
101       console.table(results.int1 || []);
102     }
103   } catch (error) {
104     console.error('Error executing main function:', error);
105   }
106 })();

Substitua as seguintes configurações e salve o arquivo run-query.js.

`<CONNECTION-STRING>`	String de conexão para conectar ao cluster do Atlas onde você deseja executar a consulta. Substitua este valor apenas se você não tiver configurado a variável de ambiente `MONGODB_URI`.
`<DB-NAME>`	Nome do banco de dados que contém a coleção.
`<COLLECTION-NAME>`	Nome da coleção que você deseja consultar.
`<INDEX-NAME>`	Nome do índice para a coleção.

Execute o seguinte comando para executar a query.

node run-query.js

Connected to MongoDB
Results from Float32 embeddings:
┌─────────┬─────────────────────────────────────────────────────────┬────────────────────┐
│ (index) │                          text                           │       score        │
├─────────┼─────────────────────────────────────────────────────────┼────────────────────┤
│    0    │ 'Mount Everest is the highest peak on Earth at 8,848m.' │ 0.6583383083343506 │
│    1    │    'The Great Wall of China is visible from space.'     │ 0.6536108255386353 │
└─────────┴─────────────────────────────────────────────────────────┴────────────────────┘
--------------------------------------------------------------------------
Results from Int8 embeddings:
┌─────────┬─────────────────────────────────────────────────────────┬────────────────────┐
│ (index) │                          text                           │       score        │
├─────────┼─────────────────────────────────────────────────────────┼────────────────────┤
│    0    │ 'Mount Everest is the highest peak on Earth at 8,848m.' │ 0.5149773359298706 │
│    1    │    'The Great Wall of China is visible from space.'     │ 0.5146723985671997 │
└─────────┴─────────────────────────────────────────────────────────┴────────────────────┘
--------------------------------------------------------------------------
Results from Packed Binary (PackedBits) embeddings:
┌─────────┬─────────────────────────────────────────────────────────┬─────────────┐
│ (index) │                          text                           │    score    │
├─────────┼─────────────────────────────────────────────────────────┼─────────────┤
│    0    │ 'Mount Everest is the highest peak on Earth at 8,848m.' │ 0.642578125 │
│    1    │    'The Great Wall of China is visible from space.'     │ 0.61328125  │
└─────────┴─────────────────────────────────────────────────────────┴─────────────┘

Instale as bibliotecas necessárias.

Execute o seguinte comando para instalar o driver Node.js do MongoDB Node.js Driver. Esta operação pode levar alguns minutos para ser concluída.

npm install mongodb

npm install cohere-ai dotenv
npm show cohere-ai version

Configure as variáveis de ambiente no seu terminal.

Para acessar o provedor de modelo de incorporação para gerar e converter incorporações, configure a variável de ambiente para a chave de API do provedor de modelo de incorporação, se necessário.
Para usar incorporações do Cohere, defina a variável de ambiente COHERE_API_KEY.
```
export COHERE_API_KEY="<COHERE-API-KEY>"
```
Caso não defina a variável de ambiente, substitua o <COHERE-API-KEY> no código de exemplo pela chave de API antes de executar o código.
Para acessar o cluster Atlas, defina a variável de ambiente MONGODB_URI.
```
export MONGODB_URI="<CONNECTION-STRING>"
```
Sua string de conexão deve estar no seguinte formato:
```
mongodb+srv://<db_username>:<db_password>@<clusterName>.<hostname>.mongodb.net
```
Caso você não configure a variável de ambiente, substitua o <CONNECTION-STRING> no código de exemplo pela sua string de conexão antes de executar o código.

Busque os dados em seu cluster do Atlas.

Crie um arquivo denominado get-data.js.
```
touch get-data.js
```

Copie e cole o código de exemplo a seguir para buscar os dados do namespace sample_airbnb.listingsAndReviews no seu cluster Atlas.

O código de exemplo faz o seguinte:

Conecta-se ao seu cluster do Atlas e encontra documentos com o campo summary.
Cria um arquivo chamado subset.json para o qual grava os dados da coleção.

get-data.js

1 const { MongoClient } = require('mongodb');
2 const fs = require('fs'); // Import the fs module for file system operations
3 
4 async function main() {
5     // Replace with your Atlas connection string
6     const uri = process.env.MONGODB_URI || '<CONNECTION-STRING>';
7 
8     // Create a new MongoClient instance
9     const client = new MongoClient(uri);
10 
11     try {
12     // Connect to your Atlas cluster
13     await client.connect();
14 
15     // Specify the database and collection
16     const db = client.db('sample_airbnb');
17     const collection = db.collection('listingsAndReviews');
18 
19     // Filter to exclude null or empty summary fields
20     const filter = { summary: { $nin: [null, ''] } };
21 
22     // Get a subset of documents in the collection
23     const documentsCursor = collection.find(filter).limit(50);
24 
25     // Convert the cursor to an array to get the documents
26     const documents = await documentsCursor.toArray();
27 
28     // Log the documents to verify their content
29     console.log('Documents retrieved:', documents);
30 
31     // Write the documents to a local file called "subset.json"
32     const outputFilePath = './subset.json';
33     fs.writeFileSync(outputFilePath, JSON.stringify(documents, null, 2), 'utf-8');
34 
35     console.log(`Subset of documents written to: ${outputFilePath}`);
36     } catch (error) {
37     console.error('An error occurred:', error);
38     } finally {
39     // Ensure the client is closed when finished
40     await client.close();
41     }
42 }
43 
44 main().catch(console.error);

Substitua o espaço reservado <CONNECTION-STRING> se você não definiu a variável de ambiente para sua string de conexão do Atlas e então salve o arquivo.
Execute o seguinte comando para buscar os dados:
node get-data.js
Subset of documents written to: ./subset.json

Gere as incorporações vetoriais para seus dados.

Se você já possui as incorporações vetoriais float32, int8 ou int1 na sua coleção, pule esta etapa.

Crie um arquivo chamado get-embeddings.js para gerar incorporações de vetores float32, int8 e int1 usando a API embed do Cohere.
```
touch get-embeddings.js
```

Copie e cole o seguinte código no arquivo get-embeddings.js.

Este código faz o seguinte:

Gera incorporações float32, int8 e int1 para os dados fornecidos usando o modelo de incorporação embed-english-v3.0 da Cohere.
Armazena as incorporações float32, int8 e int1 em campos chamados float, int8 e ubinary, respectivamente.
Cria um arquivo chamado embeddings.json e salva as incorporações no arquivo.

get-embeddings.js

1 // Import necessary modules using the CommonJS syntax
2 const { CohereClient } = require('cohere-ai');
3 const { readFile, writeFile } = require('fs/promises');
4 
5 // Retrieve the API key from environment variables or provide a placeholder
6 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
7 
8 if (!apiKey || apiKey === '<COHERE-API-KEY>') {
9   throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10 }
11 
12 // Initialize the Cohere client with the API key
13 const cohere = new CohereClient({ token: apiKey });
14 
15 async function main() {
16   try {
17     // Read and parse the contents of 'subset.json'
18     const subsetData = await readFile('subset.json', 'utf-8');
19     const documents = JSON.parse(subsetData);
20 
21     // Extract the 'summary' fields that are non-empty strings
22     const data = documents
23       .map(doc => doc.summary)
24       .filter(summary => typeof summary === 'string' && summary.length > 0);
25 
26     if (data.length === 0) {
27       throw new Error('No valid summary texts available in the data.');
28     }
29 
30     // Request embeddings from the Cohere API
31     const response = await cohere.v2.embed({
32       model: 'embed-english-v3.0',
33       inputType: 'search_document',
34       texts: data,
35       embeddingTypes: ['float', 'int8', 'ubinary'],
36     });
37 
38     // Extract embeddings from the API response
39     const { float, int8, ubinary } = response.embeddings;
40 
41     // Structure the embeddings data
42     const embeddingsData = data.map((text, index) => ({
43       text,
44       embeddings: {
45         float: float[index],
46         int8: int8[index],
47         ubinary: ubinary[index],
48       },
49     }));
50 
51     // Write the embeddings data to 'embeddings.json'
52     await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
53     console.log('Embeddings saved to embeddings.json');
54   } catch (error) {
55     console.error('Error fetching embeddings:', error);
56   }
57 }
58 
59 // Execute the main function
60 main();

Se você não definiu a variável de ambiente para sua chave de API Cohere, substitua o espaço reservado <COHERE-API-KEY> e salve o arquivo.
Execute o código para gerar as incorporações.
node get-embeddings.js
Embeddings saved to embeddings.json
Verifique as incorporações geradas abrindo o arquivo embeddings.json gerado.

Converta as incorporações vetoriais em vetores `binData`.

Crie um arquivo chamado convert-embeddings.js para converter as incorporações vetoriais float32, int8 e int1 do Cohere em vetores BSON binData.
```
touch convert-embeddings.js
```

Copie e cole o seguinte código no arquivo convert-embeddings.js.

Este código faz o seguinte:

Gera vetores BSON binData para as incorporações float32, int8 e int1.
Anexa os vetores float32, int8 e ubinary BSON binData ao arquivo embeddings.json.

convert-embeddings.js

1 const fs = require('fs/promises');
2 const { BSON } = require('mongodb');
3 const { Binary } = BSON;
4 
5 async function main() {
6   try {
7     // Read and parse the contents of 'embeddings.json' file
8     const fileContent = await fs.readFile('embeddings.json', 'utf8');
9     const embeddingsData = JSON.parse(fileContent);
10 
11     // Map the embeddings data to add BSON binary representations with subtype 9
12     const convertEmbeddingsData = embeddingsData.map(({ text, embeddings }) => {
13       // Create Binary for Float32Array with manual subtype 9
14       const bsonFloat32 = Binary.fromFloat32Array(new Float32Array(embeddings.float));
15 
16       // Create Binary for Int8Array with subtype 9
17       const bsonInt8 = Binary.fromInt8Array(new Int8Array(embeddings.int8));
18 
19       // Create Binary for PackedBits (Uint8Array) with subtype 9
20       const bsonPackedBits = Binary.fromPackedBits(new Uint8Array(embeddings.ubinary));
21 
22       return {
23         text,
24         embeddings: {
25           float: embeddings.float, // Original float data
26           int8: embeddings.int8, // Original int8 data
27           ubinary: embeddings.ubinary, // Original packed bits data
28         },
29         bsonEmbeddings: {
30           float32: bsonFloat32,
31           int8: bsonInt8,
32           packedBits: bsonPackedBits,
33         },
34       };
35     });
36 
37     // Serialize the updated data to EJSON for BSON compatibility
38     const ejsonSerializedData = BSON.EJSON.stringify(convertEmbeddingsData, null, null, { relaxed: false });
39 
40     // Write the serialized data to 'embeddings.json'
41     await fs.writeFile('embeddings.json', ejsonSerializedData);
42     console.log('Embeddings with BSON vectors have been saved to embeddings.json');
43   } catch (error) {
44     console.error('Error processing embeddings:', error);
45   }
46 }
47 
48 main();

Execute o programa para gerar os vetores BSON binData.
node convert-embeddings.js
Embeddings with BSON vectors have been saved to embeddings.json
Verifique as incorporações BSON geradas no arquivo embeddings.json.

Conecte-se ao cluster do Atlas e carregue os dados para o namespace.

Crie um arquivo chamado upload-data.js para se conectar ao cluster do Atlas e carregar os dados no namespace sample_airbnb.listingsAndReviews.
```
touch upload-data.js
```

Copie e cole o seguinte código no arquivo upload-data.js.

Este código faz o seguinte:

Conecta-se ao seu cluster do Atlas e cria um namespace com o nome do banco de dados e da coleção que você especificar.
Carrega os dados, incluindo as incorporações, no namespace sample_airbnb.listingsAndReviews.

upload-data.js

1 const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2 const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3 const { EJSON, Binary } = require('bson'); // Import EJSON and Binary from bson
4 
5 async function main() {
6   const MONGODB_URI = process.env.MONGODB_URI || "<CONNECTION-STRING>";
7   const DB_NAME = "sample_airbnb";
8   const COLLECTION_NAME = "listingsAndReviews";
9 
10   let client;
11   try {
12     // Connect to MongoDB
13     client = new MongoClient(MONGODB_URI);
14     await client.connect();
15     console.log("Connected to MongoDB");
16 
17     // Access database and collection
18     const db = client.db(DB_NAME);
19     const collection = db.collection(COLLECTION_NAME);
20 
21     // Load embeddings from JSON using EJSON.parse
22     const fileContent = await fs.readFile('embeddings.json', 'utf8');
23     const embeddingsData = EJSON.parse(fileContent); // Use EJSON.parse
24 
25     // Map embeddings data to recreate BSON binary representations
26     const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
27       return {
28         summary: text,
29         bsonEmbeddings: {
30           float32: bsonEmbeddings.float32,
31           int8: bsonEmbeddings.int8,
32           int1: bsonEmbeddings.packedBits
33         }
34       };
35     });
36 
37     // Iterate over documents and upsert each into the MongoDB collection
38     for (const doc of documents) {
39       const filter = { summary: doc.summary };
40       const update = { $set: doc };
41 
42       // Update the document with the BSON binary data
43       const result = await collection.updateOne(filter, update, { upsert: true });
44       if (result.matchedCount > 0) {
45         console.log(`Updated document with summary: ${doc.summary}`);
46       } else {
47         console.log(`Inserted new document with summary: ${doc.summary}`);
48       }
49     }
50 
51     console.log("Embeddings stored in MongoDB successfully.");
52   } catch (error) {
53     console.error('Error storing embeddings in MongoDB:', error);
54   } finally {
55     if (client) {
56       await client.close();
57     }
58   }
59 }
60 
61 // Run the main function to load the data
62 main();

Substitua o espaço reservado <CONNECTION-STRING> se você não definiu a variável de ambiente para sua string de conexão do Atlas e então salve o arquivo.
Execute o seguinte comando para carregar os dados.
node upload-data.js
Connected to MongoDB Updated document with text: ... ... Embeddings stored in MongoDB successfully.
Verifique fazendo login no seu cluster do Atlas e verificando o namespace no Data Explorer.

Crie o índice do Atlas Vector Search na coleção.

Crie um arquivo denominado create-index.js.
```
touch create-index.js
```

Copie e cole o seguinte código para criar o índice no arquivo create-index.js.

O código faz o seguinte:

Conecta-se ao cluster do Atlas e cria um índice com o nome especificado para o namespace especificado.
Indexa os campos bsonEmbeddings.float32 e bsonEmbeddings.int8 como tipo vector usando a função de similaridade dotProduct, e o campo bsonEmbeddings.int1 também como tipo vector usando a função euclidean.

create-index.js

1 const { MongoClient } = require("mongodb");
2 const { setTimeout } = require("timers/promises"); // Import from timers/promises
3 
4 // Connect to your Atlas deployment
5 const uri = process.env.MONGODB_URI || "<CONNECTION-STRING>";
6 
7 const client = new MongoClient(uri);
8 
9 async function main() {
10   try {
11     const database = client.db("<DB-NAME>");
12     const collection = database.collection("<COLLECTION-NAME>");
13 
14     // Define your Atlas Vector Search index
15     const index = {
16       name: "<INDEX-NAME>",
17       type: "vectorSearch",
18       definition: {
19         fields: [
20           {
21             type: "vector",
22             numDimensions: 1024,
23             path: "bsonEmbeddings.float32",
24             similarity: "dotProduct",
25           },
26           {
27             type: "vector",
28             numDimensions: 1024,
29             path: "bsonEmbeddings.int8",
30             similarity: "dotProduct",
31           },
32           {
33             type: "vector",
34             numDimensions: 1024,
35             path: "bsonEmbeddings.int1",
36             similarity: "euclidean",
37           },
38         ],
39       },
40     };
41 
42     // Run the helper method
43     const result = await collection.createSearchIndex(index);
44     console.log(`New search index named ${result} is building.`);
45 
46     // Wait for the index to be ready to query
47     console.log("Polling to check if the index is ready. This may take up to a minute.");
48     let isQueryable = false;
49 
50     // Use filtered search for index readiness
51     while (!isQueryable) {
52       const [indexData] = await collection.listSearchIndexes(index.name).toArray();
53 
54       if (indexData) {
55         isQueryable = indexData.queryable;
56         if (!isQueryable) {
57           await setTimeout(5000); // Wait for 5 seconds before checking again
58         }
59       } else {
60         // Handle the case where the index might not be found
61         console.log(`Index ${index.name} not found.`);
62         await setTimeout(5000); // Wait for 5 seconds before checking again
63       }
64     }
65 
66     console.log(`${result} is ready for querying.`);
67   } catch (error) {
68     console.error("Error:", error);
69   } finally {
70     await client.close();
71   }
72 }
73 
74 main().catch((err) => {
75   console.error("Unhandled error:", err);
76 });

Substitua as seguintes configurações e salve o arquivo.

`<CONNECTION-STRING>`	String de conexão para se conectar ao seu cluster Atlas onde você deseja criar o banco de dados e a coleção. Substitua este valor apenas se você não tiver configurado a variável de ambiente `MONGODB_URI`.
`<DB-NAME>`	Nome da coleção, que é `sample_airbnb`.
`<COLLECTION-NAME>`	Nome da coleção, que é `listingsAndReviews`.
`<INDEX-NAME>`	Nome do índice para a coleção.

Execute o seguinte comando para criar o índice.
node create-index.js
New search index named vector_index is building. Polling to check if the index is ready. This may take up to a minute. <INDEX-NAME> is ready for querying.

Gerar as incorporações para o texto da consulta.

Crie um arquivo denominado get-query-embeddings.js.
```
touch get-query-embeddings.js
```

Copie e cole o código no arquivo get-query-embedding.js.

O código de exemplo faz o seguinte:

Gera incorporações float32, int8 e int1 para o texto da consulta usando o Cohere.
Converte as incorporações geradas em binData vetores BSON usando o PyMongo.
Salva as incorporações geradas em um arquivo chamado query-embeddings.json.

get-query-embedding.js

1 const { CohereClient } = require('cohere-ai');
2 const { BSON } = require('mongodb');
3 const { writeFile } = require('fs/promises');
4 const dotenv = require('dotenv');
5 const process = require('process');
6 
7 // Load environment variables
8 dotenv.config();
9 
10 const { Binary } = BSON;
11 
12 // Get the API key from environment variables or set the key here
13 const apiKey = process.env.COHERE_API_KEY || '<COHERE-API-KEY>';
14 
15 if (!apiKey) {
16   throw new Error('API key not found. Provide the COHERE_API_KEY.');
17 }
18 
19 // Initialize CohereClient
20 const cohere = new CohereClient({ token: apiKey });
21 
22 async function main(queryText) {
23   try {
24     if (typeof queryText !== 'string' || queryText.trim() === '') {
25       throw new Error('Invalid query text. It must be a non-empty string.');
26     }
27 
28     const data = [queryText];
29 
30     // Request embeddings from the Cohere API
31     const response = await cohere.v2.embed({
32       model: 'embed-english-v3.0',
33       inputType: 'search_query',
34       texts: data,
35       embeddingTypes: ['float', 'int8', 'ubinary'], // Request all required embedding types
36     });
37 
38     if (!response.embeddings) {
39       throw new Error('Embeddings not found in the API response.');
40     }
41 
42     const { float, int8, ubinary } = response.embeddings;
43 
44     const updatedEmbeddingsData = data.map((text, index) => {
45       // Create the BSON Binary objects using VECTOR_TYPE for all embedding types
46       const float32Binary = Binary.fromFloat32Array(new Float32Array(float[index])); // VECTOR_TYPE.FLOAT32
47       const int8Binary = Binary.fromInt8Array(new Int8Array(int8[index])); // VECTOR_TYPE.INT8
48       const packedBitsBinary = Binary.fromPackedBits(new Uint8Array(ubinary[index])); // VECTOR_TYPE.PACKED_BIT
49 
50       return {
51         text,
52         embeddings: {
53           float: float[index],
54           int8: int8[index],
55           ubinary: ubinary[index],
56         },
57         bsonEmbeddings: {
58           float32: float32Binary,
59           int8: int8Binary,
60           int1: packedBitsBinary,
61         },
62       };
63     });
64 
65     // Serialize the embeddings using BSON EJSON for BSON compatibility
66     const outputFileName = 'query-embeddings.json';
67     const ejsonSerializedData = BSON.EJSON.stringify(updatedEmbeddingsData, null, null, { relaxed: false });
68     await writeFile(outputFileName, ejsonSerializedData);
69     console.log(`Embeddings with BSON data have been saved to ${outputFileName}`);
70   } catch (error) {
71     console.error('Error processing query text:', error);
72   }
73 }
74 
75 // Main function that takes a query string
76 (async () => {
77   const queryText = "<QUERY-TEXT>"; // Replace with your actual query text
78   await main(queryText);
79 })();

Substitua as seguintes configurações e salve o arquivo.

`<COHERE-API-KEY>`	Sua chave de API para o Cohere. Substitua este valor apenas se você não tiver configurado a chave como uma variável de ambiente.
`<QUERY-TEXT>`	Seu texto de consulta. Para este exemplo, use `ocean view`.

Execute o código para gerar as incorporações para o texto da query.
node get-query-embeddings.js
Embeddings with BSON vectors have been saved to query-embeddings.json

Execute uma consulta do Atlas Vector Search.

Crie um arquivo denominado run-query.js.
```
touch run-query.js
```

Copie e cole a consulta de amostra $vectorSearch a seguir no arquivo run-query.js.

A query de amostra faz o seguinte:

Conecta-se ao seu cluster do Atlas e executa a consulta $vectorSearch nos campos bsonEmbeddings.float32, bsonEmbeddings.int8 e bsonEmbeddings.int1 no namespace sample_airbnb.listingsAndReviews usando as incorporações no arquivo query-embeddings.json.
Imprime os resultados das incorporações Float32, Int8 e Binário Empacotado (Int1) no console.

run-query.js

1 const { MongoClient } = require('mongodb');
2 const fs = require('fs/promises');
3 const { BSON } = require('bson'); // Use BSON's functionality for EJSON parsing
4 const dotenv = require('dotenv');
5 
6 dotenv.config();
7 
8 // MongoDB connection details
9 const mongoUri = process.env.MONGODB_URI || '<CONNECTION-STRING>';
10 const dbName = 'sample_airbnb'; // Update with your actual database name
11 const collectionName = 'listingsAndReviews'; // Update with your actual collection name
12 
13 // Indices and paths should match your MongoDB vector search configuration
14 const VECTOR_INDEX_NAME = '<INDEX-NAME>'; // Replace with your actual index name
15 const NUM_CANDIDATES = 20; // Number of candidate documents for the search
16 const LIMIT = 5; // Limit for the number of documents to return
17 
18 // Fields in the collection that contain the BSON query vectors
19 const FIELDS = [
20   { path: 'float32', subtype: 9 }, // Ensure that the path and custom subtype match
21   { path: 'int8', subtype: 9 },    // Use the custom subtype if needed
22   { path: 'int1', subtype: 9 } // Use the same custom subtype
23 ];
24 
25 
26 // Function to read BSON vectors from JSON and run vector search
27 async function main() {
28   // Initialize MongoDB client
29   const client = new MongoClient(mongoUri);
30 
31   try {
32     await client.connect();
33     console.log("Connected to MongoDB");
34 
35     const db = client.db(dbName);
36     const collection = db.collection(collectionName);
37 
38     // Load query embeddings from JSON file using EJSON parsing
39     const fileContent = await fs.readFile('query-embeddings.json', 'utf8');
40     const embeddingsData = BSON.EJSON.parse(fileContent);
41 
42     // Define and run the query for each embedding type
43     const results = {};
44 
45     for (const fieldInfo of FIELDS) {
46       const { path, subtype } = fieldInfo;
47       const bsonBinary = embeddingsData[0]?.bsonEmbeddings?.[path];
48       
49       if (!bsonBinary) {
50         console.warn(`BSON embedding for ${path} not found in the JSON.`);
51         continue;
52       }
53 
54       const bsonQueryVector = bsonBinary; // Directly use BSON Binary object
55 
56       const pipeline = [
57         {
58           $vectorSearch: {
59             index: VECTOR_INDEX_NAME,
60             path: `bsonEmbeddings.${path}`,
61             queryVector: bsonQueryVector,
62             numCandidates: NUM_CANDIDATES,
63             limit: LIMIT,
64           }
65         },
66         {
67           $project: {
68             _id: 0,
69             name: 1,
70             summary: 1, // Adjust projection fields as necessary to match your document structure
71             score: { $meta: 'vectorSearchScore' }
72           }
73         }
74       ];
75 
76       results[path] = await collection.aggregate(pipeline).toArray();
77     }
78 
79     return results;
80   } catch (error) {
81     console.error('Error during vector search:', error);
82   } finally {
83     await client.close();
84   }
85 }
86 
87 // Main execution block
88 (async () => {
89     try {
90       const results = await main();
91   
92       if (results) {
93         console.log("Results from Float32 embeddings:");
94         (results.float32 || []).forEach((result, index) => {
95           console.log(`Result ${index + 1}:`, result);
96         });
97   
98         console.log("Results from Int8 embeddings:");
99         (results.int8 || []).forEach((result, index) => {
100           console.log(`Result ${index + 1}:`, result);
101         });
102   
103         console.log("Results from Packed Binary (PackedBits) embeddings:");
104         (results.int1 || []).forEach((result, index) => {
105           console.log(`Result ${index + 1}:`, result);
106         });
107       }
108     } catch (error) {
109       console.error('Error executing main function:', error);
110     }
111   })();
112

Substitua as seguintes configurações e salve o arquivo run-query.js.

`<CONNECTION-STRING>`	String de conexão para se conectar ao seu cluster Atlas onde você deseja criar o banco de dados e a coleção. Substitua este valor se você não definiu a variável de ambiente `MONGODB_URI`.
`<INDEX-NAME>`	Nome do índice para a coleção.

Execute a consulta.

Para executar a consulta, execute o seguinte comando:

node run-query.js

Connected to MongoDB
Results from Float32 embeddings:
Result 1: {
name: 'Makaha Valley Paradise with OceanView',
summary: "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.",
score: 0.7278661131858826
}
Result 2: {
name: 'Ocean View Waikiki Marina w/prkg',
summary: "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
score: 0.688639760017395
}
Result 3: {
name: 'A Casa Alegre é um apartamento T1.',
summary: 'Para 2 pessoas. Vista de mar a 150 mts. Prédio com 2 elevadores. Tem: - quarto com roupeiro e cama de casal (colchão magnetizado); - cozinha: placa de discos, exaustor, frigorifico, micro-ondas e torradeira; casa de banho completa; - sala e varanda.',
score: 0.6831139326095581
}
Result 4: {
name: 'Your spot in Copacabana',
summary: 'Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.',
score: 0.6802051663398743
}
Result 5: {
name: 'LAHAINA, MAUI! RESORT/CONDO BEACHFRONT!! SLEEPS 4!',
summary: 'THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!',
score: 0.6779564619064331
}
Results from Int8 embeddings:
Result 1: {
name: 'Makaha Valley Paradise with OceanView',
summary: "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.",
score: 0.5215557217597961
}
Result 2: {
name: 'Ocean View Waikiki Marina w/prkg',
summary: "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
score: 0.5179016590118408
}
Result 3: {
name: 'A Casa Alegre é um apartamento T1.',
summary: 'Para 2 pessoas. Vista de mar a 150 mts. Prédio com 2 elevadores. Tem: - quarto com roupeiro e cama de casal (colchão magnetizado); - cozinha: placa de discos, exaustor, frigorifico, micro-ondas e torradeira; casa de banho completa; - sala e varanda.',
score: 0.5173280239105225
}
Result 4: {
name: 'Your spot in Copacabana',
summary: 'Having a large airy living room. The apartment is well divided. Fully furnished and cozy. The building has a 24h doorman and camera services in the corridors. It is very well located, close to the beach, restaurants, pubs and several shops and supermarkets. And it offers a good mobility being close to the subway.',
score: 0.5170232057571411
}
Result 5: {
name: 'LAHAINA, MAUI! RESORT/CONDO BEACHFRONT!! SLEEPS 4!',
summary: 'THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!',
score: 0.5168724060058594
}
Results from Packed Binary (PackedBits) embeddings:
Result 1: {
name: 'Makaha Valley Paradise with OceanView',
summary: "A beautiful and comfortable 1 Bedroom Air Conditioned Condo in Makaha Valley - stunning Ocean & Mountain views All the amenities of home, suited for longer stays. Full kitchen & large bathroom.  Several gas BBQ's for all guests to use & a large heated pool surrounded by reclining chairs to sunbathe.  The Ocean you see in the pictures is not even a mile away, known as the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  paddle boarding, surfing are all just minutes from the front door.",
score: 0.6591796875
}
Result 2: {
name: 'Ocean View Waikiki Marina w/prkg',
summary: "A short distance from Honolulu's billion dollar mall, and the same distance to Waikiki. Parking included. A great location that work perfectly for business, education, or simple visit. Experience Yacht Harbor views and 5 Star Hilton Hawaiian Village.",
score: 0.6337890625
}
Result 3: {
name: 'A Casa Alegre é um apartamento T1.',
summary: 'Para 2 pessoas. Vista de mar a 150 mts. Prédio com 2 elevadores. Tem: - quarto com roupeiro e cama de casal (colchão magnetizado); - cozinha: placa de discos, exaustor, frigorifico, micro-ondas e torradeira; casa de banho completa; - sala e varanda.',
score: 0.62890625
}
Result 4: {
name: 'LAHAINA, MAUI! RESORT/CONDO BEACHFRONT!! SLEEPS 4!',
summary: 'THIS IS A VERY SPACIOUS 1 BEDROOM FULL CONDO (SLEEPS 4) AT THE BEAUTIFUL VALLEY ISLE RESORT ON THE BEACH IN LAHAINA, MAUI!! YOU WILL LOVE THE PERFECT LOCATION OF THIS VERY NICE HIGH RISE! ALSO THIS SPACIOUS FULL CONDO, FULL KITCHEN, BIG BALCONY!!',
score: 0.6279296875
}
Result 5: {
name: 'Be Happy in Porto',
summary: 'Be Happy Apartment is an amazing space. Renovated and comfortable apartment, located in a building dating from the nineteenth century in one of the most emblematic streets of the Porto city "Rua do Almada".  Be Happy Apartment is located in the city center, able you to visit the historic center only by foot, being very close of majority points of interesting of the Porto City. Be Happy Apartment is located close of central Station MetroTrindade.',
score: 0.619140625
}

Seus resultados podem ser diferentes porque as incorporações geradas podem variar conforme o seu ambiente.

Crie um bloco de anotações Python interativo salvando um arquivo com a extensão .ipynb e, em seguida, execute as seguintes etapas no bloco de anotações. Para tentar o exemplo, substitua os espaços reservados por valores válidos.

Trabalhe com uma versão executável deste tutorial como um notebook do Python.

Instale as bibliotecas necessárias.

Execute o comando a seguir para instalar o driver do PyMongo. Se necessário, você também pode instalar bibliotecas do fornecedor do modelo de incorporação. Esta operação pode levar alguns minutos para ser concluída.

pip install pymongo

Você deve instalar o driver PyMongo v4.10 ou posterior.

Exemplo

Instale o PyMongo e o Cohere

pip install --quiet --upgrade pymongo cohere

Carregue os dados para os quais você deseja gerar vetores BSON no seu notebook.

Exemplo

Dados de amostra para importar

data = [
   "The Great Wall of China is visible from space.",
   "The Eiffel Tower was completed in Paris in 1889.",
   "Mount Everest is the highest peak on Earth at 8,848m.",
   "Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
   "The Mona Lisa was painted by Leonardo da Vinci.",
]

(Condicional) Gere incorporações a partir de seus dados.

Esta etapa é necessária se você ainda não gerou incorporações de seus dados. Se você já gerou incorporações, pule esta etapa. Para saber mais sobre como gerar incorporações de seus dados, consulte Como criar incorporações vetoriais.

Exemplo

Gerar incorporações a partir de dados de amostra usando o Cohere

Espaço reservado	Valor Válido
`<COHERE-API-KEY>`	Chave de API para Cohere.

import os
import cohere
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Generate embeddings using the embed-english-v3.0 model
generated_embeddings = cohere_client.embed(
   texts=data,
   model="embed-english-v3.0",
   input_type="search_document",
   embedding_types=["float", "int8", "ubinary"]
).embeddings
float32_embeddings = generated_embeddings.float
int8_embeddings = generated_embeddings.int8
int1_embeddings = generated_embeddings.ubinary

Gere os vetores BSON a partir de suas incorporações.

Você pode usar o driver PyMongo para converter sua incorporação vetorial nativa em vetores BSON .

Exemplo

Defina e execute uma função para gerar vetores BSON

from bson.binary import Binary, BinaryVectorDtype
def generate_bson_vector(vector, vector_dtype):
   return Binary.from_vector(vector, vector_dtype)
# For all vectors in your collection, generate BSON vectors of float32, int8, and int1 embeddings
bson_float32_embeddings = []
bson_int8_embeddings = []
bson_int1_embeddings = []
for i, (f32_emb, int8_emb, int1_emb) in enumerate(zip(float32_embeddings, int8_embeddings, int1_embeddings)):
   bson_float32_embeddings.append(generate_bson_vector(f32_emb, BinaryVectorDtype.FLOAT32))
   bson_int8_embeddings.append(generate_bson_vector(int8_emb, BinaryVectorDtype.INT8))
   bson_int1_embeddings.append(generate_bson_vector(int1_emb, BinaryVectorDtype.PACKED_BIT))

Crie documentos com incorporações vetoriais BSON .

Se você já tiver as incorporações de vetores BSON dentro de documentos em sua coleção, pule esta etapa.

Exemplo

Crie documentos a partir dos dados de amostra

Espaço reservado	Valor Válido
`<FIELD-NAME-FOR-FLOAT32-TYPE>`	Nome do campo com valores `float32`.
`<FIELD-NAME-FOR-INT8-TYPE>`	Nome do campo com valores `int8`.
`<FIELD-NAME-FOR-INT1-TYPE>`	Nome do campo com valores `int1`.

# Specify the field names for the float32, int8, and int1 embeddings
float32_field = "<FIELD-NAME-FOR-FLOAT32-TYPE>"
int8_field = "<FIELD-NAME-FOR-INT8-TYPE>"
int1_field = "<FIELD-NAME-FOR-INT1-TYPE>"
# Define function to create documents with BSON vector embeddings
def create_docs_with_bson_vector_embeddings(bson_float32_embeddings, bson_int8_embeddings, bson_int1_embeddings, data):
  docs = []
  for i, (bson_f32_emb, bson_int8_emb, bson_int1_emb, text) in enumerate(zip(bson_float32_embeddings, bson_int8_embeddings, bson_int1_embeddings, data)):
     doc = {
          "_id": i,
          "data": text,
          float32_field: bson_f32_emb,
          int8_field: bson_int8_emb,
          int1_field: bson_int1_emb
     }
     docs.append(doc)
  return docs
# Create the documents
documents = create_docs_with_bson_vector_embeddings(bson_float32_embeddings, bson_int8_embeddings, bson_int1_embeddings, data)

Carregue seus dados em seu cluster do Atlas.

Você pode carregar seus dados da interface do Atlas e programaticamente. Para saber como carregar seus dados da interface do usuário do Atlas, consulte Inserir seus dados. As etapas a seguir e os exemplos associados demonstram como carregar seus dados de forma programática usando o driver PyMongo .

Conecte-se ao seu cluster do Atlas.
Espaço reservado
Valor Válido
<ATLAS-CONNECTION-STRING>
Connection string do Atlas. Para saber mais, consulte Conectar via drivers.
Exemplo
import pymongo mongo_client = pymongo.MongoClient("<ATLAS-CONNECTION-STRING>") if not MONGO_URI: print("MONGO_URI not set in environment variables")

Carregue os dados em seu cluster do Atlas.

Espaço reservado	Valor Válido
`<DB-NAME>`	Nome do banco de dados.
`<COLLECTION-NAME>`	Nome da collection no banco de dados especificado .

Exemplo

# Insert documents into a new database and collection
db = mongo_client["<DB-NAME>"]
collection_name = "<COLLECTION-NAME>"
db.create_collection(collection_name)
collection = db[collection_name]
collection.insert_many(documents)

Crie o índice do Atlas Vector Search na coleção.

Você pode criar índices do Atlas Vector Search usando a IU do Atlas, a CLI do Atlas, a Administration API do Atlas e os drivers do MongoDB. Para saber mais, consulte Como indexar campos do Vector Search.

Exemplo

Crie um índice para a coleção de amostras

Espaço reservado	Valor Válido
`<INDEX-NAME>`	Nome do índice de tipo `vector`.

from pymongo.operations import SearchIndexModel
import time
# Define and create the vector search index
index_name = "<INDEX-NAME>"
search_index_model = SearchIndexModel(
  definition={
    "fields": [
      {
        "type": "vector",
        "path": float32_field,
        "similarity": "dotProduct",
        "numDimensions": 1024
      },
      {
        "type": "vector",
        "path": int8_field,
        "similarity": "dotProduct",
        "numDimensions": 1024
      },
      {
        "type": "vector",
        "path": int1_field,
        "similarity": "euclidean",
        "numDimensions": 1024
      }
    ]
  },
  name=index_name,
  type="vectorSearch"
)
result = collection.create_search_index(model=search_index_model)
print("New search index named " + result + " is building.")
# Wait for initial sync to complete
print("Polling to check if the index is ready. This may take up to a minute.")
predicate=None
if predicate is None:
  predicate = lambda index: index.get("queryable") is True
while True:
  indices = list(collection.list_search_indexes(index_name))
  if len(indices) and predicate(indices[0]):
    break
  time.sleep(5)
print(result + " is ready for querying.")

Defina uma função para executar as queries do Atlas Vector Search.

A função para executar queries do Atlas Vector Search deve executar as seguintes ações:

Converta o texto da query em um vetor BSON.
Defina o pipeline para a query do Atlas Vector Search.

Exemplo

Espaço reservado	Valor Válido
`<NUMBER-OF-CANDIDATES-TO-CONSIDER>`	Número de vizinhos mais próximos a serem utilizados durante a pesquisa.
`<NUMBER-OF-DOCUMENTS-TO-RETURN>`	Número de documentos a retornar nos resultados.

# Define a function to run a vector search query
def run_vector_search(query_text, collection, path):
  query_text_embeddings = cohere_client.embed(
    texts=[query_text],
    model="embed-english-v3.0",
    input_type="search_query",
    embedding_types=["float", "int8", "ubinary"]
  ).embeddings
  if path == float32_field:
    query_vector = query_text_embeddings.float[0]
    vector_dtype = BinaryVectorDtype.FLOAT32
  elif path == int8_field:
    query_vector = query_text_embeddings.int8[0]
    vector_dtype = BinaryVectorDtype.INT8
  elif path == int1_field:
    query_vector = query_text_embeddings.ubinary[0]
    vector_dtype = BinaryVectorDtype.PACKED_BIT
  bson_query_vector = generate_bson_vector(query_vector, vector_dtype)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 5
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 2
       }
     },
     {
       '$project': {
         '_id': 0,
         'data': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

Execute a query do Atlas Vector Search.

Você pode executar queries do Atlas Vector Search de forma programática. Para saber mais, consulte Executar queries de pesquisa vetorial.

Exemplo

from pprint import pprint
# Run the vector search query on the float32, int8, and int1 embeddings
query_text = "tell me a science fact"
float32_results = run_vector_search(query_text, collection, float32_field)
int8_results = run_vector_search(query_text, collection, int8_field)
int1_results = run_vector_search(query_text, collection, int1_field)
print("results from float32 embeddings")
pprint(list(float32_results))
print("--------------------------------------------------------------------------")
print("results from int8 embeddings")
pprint(list(int8_results))
print("--------------------------------------------------------------------------")
print("results from int1 embeddings")
pprint(list(int1_results))

results from float32 embeddings
[{'data': 'Mount Everest is the highest peak on Earth at 8,848m.',
  'score': 0.6578356027603149},
 {'data': 'The Great Wall of China is visible from space.',
  'score': 0.6420407891273499}]
--------------------------------------------------------------------------
results from int8 embeddings
[{'data': 'Mount Everest is the highest peak on Earth at 8,848m.',
  'score': 0.5149182081222534},
 {'data': 'The Great Wall of China is visible from space.',
  'score': 0.5136760473251343}]
--------------------------------------------------------------------------
results from int1 embeddings
[{'data': 'Mount Everest is the highest peak on Earth at 8,848m.',
  'score': 0.62109375},
 {'data': 'The Great Wall of China is visible from space.',
  'score': 0.61328125}]

Trabalhe com uma versão executável deste tutorial como um notebook do Python.

Instale as bibliotecas necessárias.

pip install pymongo

Você deve instalar o driver PyMongo v4.10 ou posterior.

Exemplo

Instale o PyMongo e o Cohere

pip install --quiet --upgrade pymongo cohere

Defina as funções para gerar incorporações vetoriais e converter incorporações em formato compatível com BSON.

Você deve definir funções que executam o seguinte usando um modelo de incorporação:

Gere incorporações a partir dos seus dados existentes se os dados existentes não tiverem nenhuma incorporação.
Converta as incorporações em vetores BSON.

Exemplo

Função para Gerar e Converter Incorporações

Espaço reservado	Valor Válido
`<COHERE-API-KEY>`	Chave de API para Cohere.

import os
import pymongo
import cohere
from bson.binary import Binary, BinaryVectorDtype
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Define function to generate embeddings using the embed-english-v3.0 model
def get_embedding(text):
    response = cohere_client.embed(
      texts=[text],
      model='embed-english-v3.0',
      input_type='search_document',
      embedding_types=["float"]
    )
    embedding = response.embeddings.float[0]
    return embedding
# Define function to convert embeddings to BSON-compatible format
def generate_bson_vector(vector, vector_dtype):
    return Binary.from_vector(vector, vector_dtype)

import os
import pymongo
import cohere
from bson.binary import Binary, BinaryVectorDtype
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Define function to generate embeddings using the embed-english-v3.0 model
def get_embedding(text):
    response = cohere_client.embed(
      texts=[text],
      model='embed-english-v3.0',
      input_type='search_document',
      embedding_types=["int8"]
    )
    embedding = response.embeddings.int8[0]
    return embedding
# Define function to convert embeddings to BSON-compatible format
def generate_bson_vector(vector, vector_dtype):
    return Binary.from_vector(vector, vector_dtype)

import os
import pymongo
import cohere
from bson.binary import Binary, BinaryVectorDtype
# Specify your Cohere API key
os.environ["COHERE_API_KEY"] = "<COHERE-API-KEY>"
cohere_client = cohere.Client(os.environ["COHERE_API_KEY"])
# Define function to generate embeddings using the embed-english-v3.0 model
def get_embedding(text):
    response = cohere_client.embed(
      texts=[text],
      model='embed-english-v3.0',
      input_type='search_document',
      embedding_types=["ubinary"]
    )
    embedding = response.embeddings.ubinary[0]
    return embedding
# Define function to convert embeddings to BSON-compatible format
def generate_bson_vector(vector, vector_dtype):
    return Binary.from_vector(vector, vector_dtype)

Conecte-se ao Atlas cluster e recupere dados existentes.

Você deve fornecer o seguinte:

String de conexão para conectar ao seu Atlas cluster que contém o banco de dados e a coleção para a qual você deseja gerar incorporações.
Nome do banco de dados que contém a collection para a qual você deseja gerar embeddings.
Nome da collection para a qual você deseja gerar incorporações.

Exemplo

Conecte-se ao Atlas Cluster para acessar dados

Espaço reservado	Valor Válido
`<ATLAS-CONNECTION-STRING>`	Connection string do Atlas. Para saber mais, consulte Conectar via drivers.

1 # Connect to your Atlas cluster
2 mongo_client = pymongo.MongoClient("<ATLAS-CONNECTION-STRING>")
3 db = mongo_client["sample_airbnb"]
4 collection = db["listingsAndReviews"]
5 
6 # Filter to exclude null or empty summary fields
7 filter = { "summary": {"$nin": [None, ""]} }
8 
9 # Get a subset of documents in the collection
10 documents = collection.find(filter).limit(50)
11 
12 # Initialize the count of updated documents
13 updated_doc_count = 0

Gere, converta e carregue incorporações em sua coleção.

Gere incorporações a partir dos seus dados usando qualquer modelo de incorporação se seus dados ainda não tiverem incorporações. Para saber mais sobre como gerar incorporações a partir dos seus dados, consulte Como criar incorporações vetoriais.
Converta as incorporações em vetores BSON (como mostrado na linha 7 no exemplo a seguir).
Carregue as incorporações na sua coleção no Atlas cluster.

Essa operação pode levar alguns minutos para ser concluída.

Exemplo

Gerar, Converter e Carregar Incorporações na Coleção

for doc in documents:
    # Generate embeddings based on the summary
    summary = doc["summary"]
    embedding = get_embedding(summary)  # Get float32 embedding
    # Convert the float32 embedding to BSON format
    bson_float32 = generate_bson_vector(embedding, BinaryVectorDtype.FLOAT32)
    # Update the document with the BSON embedding
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"embedding": bson_float32}}
    )
    updated_doc_count += 1
print(f"Updated {updated_doc_count} documents with BSON embeddings.")

for doc in documents:
    # Generate embeddings based on the summary
    summary = doc["summary"]
    embedding = get_embedding(summary)  # Get int8 embedding
    # Convert the int8 embedding to BSON format
    bson_int8 = generate_bson_vector(embedding, BinaryVectorDtype.INT8)
    # Update the document with the BSON embedding
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"embedding": bson_int8}}
    )
    updated_doc_count += 1
print(f"Updated {updated_doc_count} documents with BSON embeddings.")

for doc in documents:
    # Generate embeddings based on the summary
    summary = doc["summary"]
    embedding = get_embedding(summary)  # Get int1 embedding
    # Convert the int1 embedding to BSON format
    bson_int1 = generate_bson_vector(embedding, BinaryVectorDtype.PACKED_BIT)
    # Update the document with the BSON embedding
    collection.update_one(
        {"_id": doc["_id"]},
        {"$set": {"embedding": bson_int1}}
    )
    updated_doc_count += 1
print(f"Updated {updated_doc_count} documents with BSON embeddings.")

Crie o índice do Atlas Vector Search na coleção.

Você pode criar índices do Atlas Vector Search usando a UI do Atlas , Atlas CLI, API de administração do Atlas e drivers do MongoDB na linguagem de sua preferência. Para saber mais, consulte Como indexar campos do Vector Search.

Exemplo

Criar Índice para a Coleção

Espaço reservado	Valor Válido
`<INDEX-NAME>`	Nome do índice de tipo `vector`.

1 from pymongo.operations import SearchIndexModel
2 import time
3 
4 # Define and create the vector search index
5 index_name = "<INDEX-NAME>"
6 search_index_model = SearchIndexModel(
7   definition={
8     "fields": [
9       {
10         "type": "vector",
11         "path": "embedding",
12         "similarity": "euclidean",
13         "numDimensions": 1024
14       }
15     ]
16   },
17   name=index_name,
18   type="vectorSearch"
19 )
20 result = collection.create_search_index(model=search_index_model)
21 print("New search index named " + result + " is building.")
22 
23 # Wait for initial sync to complete
24 print("Polling to check if the index is ready. This may take up to a minute.")
25 predicate=None
26 if predicate is None:
27   predicate = lambda index: index.get("queryable") is True
28 while True:
29   indices = list(collection.list_search_indexes(index_name))
30   if len(indices) and predicate(indices[0]):
31     break
32   time.sleep(5)
33 print(result + " is ready for querying.")

Defina uma função para executar as queries do Atlas Vector Search.

A função para executar queries do Atlas Vector Search deve executar as seguintes ações:

Gere incorporações para o texto da query.
Converta o texto da query em um vetor BSON.
Defina o pipeline para a query do Atlas Vector Search.

Exemplo

Função para executar query do Atlas Vector Search

Espaço reservado	Valor Válido
`<NUMBER-OF-CANDIDATES-TO-CONSIDER>`	Número de vizinhos mais próximos a serem utilizados durante a pesquisa.
`<NUMBER-OF-DOCUMENTS-TO-RETURN>`	Número de documentos a retornar nos resultados.

def run_vector_search(query_text, collection, path):
  query_embedding = get_embedding(query_text)
  bson_query_vector = generate_bson_vector(query_embedding, BinaryVectorDtype.FLOAT32)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 20
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 5
       }
     },
     {
       '$project': {
         '_id': 0,
         'name': 1,
         'summary': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

def run_vector_search(query_text, collection, path):
  query_embedding = get_embedding(query_text)
  bson_query_vector = generate_bson_vector(query_embedding, BinaryVectorDtype.INT8)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 20
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 5
       }
     },
     {
       '$project': {
         '_id': 0,
         'name': 1,
         'summary': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

def run_vector_search(query_text, collection, path):
  query_embedding = get_embedding(query_text)
  bson_query_vector = generate_bson_vector(query_embedding, BinaryVectorDtype.PACKED_BIT)
  pipeline = [
    {
      '$vectorSearch': {
        'index': index_name,
        'path': path,
        'queryVector': bson_query_vector,
        'numCandidates': <NUMBER-OF-CANDIDATES-TO-CONSIDER>, # for example, 20
        'limit': <NUMBER-OF-DOCUMENTS-TO-RETURN> # for example, 5
       }
     },
     {
       '$project': {
         '_id': 0,
         'name': 1,
         'summary': 1,
         'score': { '$meta': 'vectorSearchScore' }
        }
     }
  ]
  return collection.aggregate(pipeline)

Execute a query do Atlas Vector Search.

Você pode executar queries do Atlas Vector Search de forma programática. Para saber mais, consulte Executar queries de pesquisa vetorial.

Exemplo

Executar uma query de amostra do Atlas Vector Search

from pprint import pprint
query_text = "ocean view"
query_results = run_vector_search(query_text, collection, "embedding")
print("query results:")
pprint(list(query_results))

query results:
[{'name': 'Your spot in Copacabana',
  'score': 0.5468248128890991,
  'summary': 'Having a large airy living room. The apartment is well divided. '
             'Fully furnished and cozy. The building has a 24h doorman and '
             'camera services in the corridors. It is very well located, close '
             'to the beach, restaurants, pubs and several shops and '
             'supermarkets. And it offers a good mobility being close to the '
             'subway.'},
 {'name': 'Twin Bed room+MTR Mongkok shopping&My',
  'score': 0.527062714099884,
  'summary': 'Dining shopping conveniently located Mongkok subway E1, airport '
             'shuttle bus stops A21. Three live two beds, separate WC, 24-hour '
             'hot water. Free WIFI.'},
{'name': 'Quarto inteiro na Tijuca',
  'score': 0.5222363471984863,
  'summary': 'O quarto disponível tem uma cama de solteiro, sofá e computador '
             'tipo desktop para acomodação.'},
 {'name': 'Makaha Valley Paradise with OceanView',
  'score': 0.5175154805183411,
  'summary': 'A beautiful and comfortable 1 Bedroom Air Conditioned Condo in '
             'Makaha Valley - stunning Ocean & Mountain views All the '
             'amenities of home, suited for longer stays. Full kitchen & large '
             "bathroom.  Several gas BBQ's for all guests to use & a large "
             'heated pool surrounded by reclining chairs to sunbathe.  The '
             'Ocean you see in the pictures is not even a mile away, known as '
             'the famous Makaha Surfing Beach. Golfing, hiking,snorkeling  '
             'paddle boarding, surfing are all just minutes from the front '
             'door.'},
 {'name': 'Cozy double bed room 東涌鄉村雅緻雙人房',
  'score': 0.5149975419044495,
  'summary': 'A comfortable double bed room at G/F. Independent entrance. High '
             'privacy. The room size is around 100 sq.ft. with a 48"x72" '
             'double bed. The village house is close to the Hong Kong Airport, '
             'AsiaWorld-Expo, HongKong-Zhuhai-Macau Bridge, Disneyland, '
             'Citygate outlets, 360 Cable car, shopping centre, main tourist '
             'attractions......'}]

Seus resultados podem variar dependendo do tipo de dados vetorial que você especificou nas etapas anteriores.

Para obter uma demonstração avançada desse procedimento em dados de amostra usando o modelo de incorporação embed-english-v3.0 do Cohere, consulte este notebook.

Avalie os resultados da sua query

Você pode medir a precisão de sua query do Atlas Vector Search avaliando a proximidade dos resultados de uma pesquisa de ANN com os resultados de uma pesquisa de ENN em relação aos seus vetores quantizados. Ou seja, você pode comparar os resultados da pesquisa ANN com os resultados da pesquisa ENN para os mesmos critérios de consulta e medir com que frequência os resultados da pesquisa ANN incluem os vizinhos mais próximos nos resultados da pesquisa ENN.

Para uma demonstração da avaliação dos resultados da query, consulte Como medir a precisão dos resultados da query.

Voltar

Transformar documentos e filtrar coleções

Geração aumentada de recuperação (RAG)

Provedor de modelo de incorporação	Modelo de incorporação
Cohere	`embed-english-v3.0`
Nomic	`nomic-embed-text-v1.5`
Jina	`jina-embeddings-v2-base-en`
Mixedbread	`mxbai-embed-large-v1`

1	import com.cohere.api.Cohere;
2	import com.cohere.api.requests.EmbedRequest;
3	import com.cohere.api.types.EmbedByTypeResponse;
4	import com.cohere.api.types.EmbedByTypeResponseEmbeddings;
5	import com.cohere.api.types.EmbedInputType;
6	import com.cohere.api.types.EmbedResponse;
7	import com.cohere.api.types.EmbeddingType;
8	import java.io.FileOutputStream;
9	import java.io.IOException;
10	import java.util.ArrayList;
11	import java.util.List;
12	import java.util.Objects;
13	import java.util.Optional;
14	import org.bson.BinaryVector;
15	import org.bson.Document;
16
17	public class GenerateAndConvertEmbeddings {
18
19	// List of text data to embed
20	private static final List<String> DATA = List.of(
21	"The Great Wall of China is visible from space.",
22	"The Eiffel Tower was completed in Paris in 1889.",
23	"Mount Everest is the highest peak on Earth at 8,848m.",
24	"Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
25	"The Mona Lisa was painted by Leonardo da Vinci."
26	);
27
28	public static void main(String[] args) {
29	// Cohere API key for authentication
30	String apiKey = System.getenv("COHERE_API_KEY");
31
32	// Fetch embeddings from the Cohere API
33	EmbedByTypeResponseEmbeddings embeddings = fetchEmbeddingsFromCohere(apiKey);
34	Document bsonEmbeddings = convertEmbeddingsToBson(embeddings);
35
36	writeEmbeddingsToFile(bsonEmbeddings, "embeddings.json");
37	}
38
39	// Fetches embeddings based on input data from the Cohere API
40	private static EmbedByTypeResponseEmbeddings fetchEmbeddingsFromCohere(String apiKey) {
41	if (Objects.isNull(apiKey) \|\| apiKey.isEmpty()) {
42	throw new RuntimeException("API key not found. Please set COHERE_API_KEY in your environment.");
43	}
44
45	Cohere cohere = Cohere.builder().token(apiKey).clientName("embed-example").build();
46
47	try {
48	EmbedRequest request = EmbedRequest.builder()
49	.model("embed-english-v3.0")
50	.inputType(EmbedInputType.SEARCH_DOCUMENT)
51	.texts(DATA)
52	.embeddingTypes(List.of(EmbeddingType.FLOAT, EmbeddingType.INT_8, EmbeddingType.UBINARY))
53	.build();
54
55	EmbedResponse response = cohere.embed(request);
56	Optional<EmbedByTypeResponse> optionalEmbeddingsWrapper = response.getEmbeddingsByType();
57
58	return optionalEmbeddingsWrapper.orElseThrow().getEmbeddings();
59	} catch (Exception e) {
60	System.err.println("Error fetching embeddings: " + e.getMessage());
61	throw e;
62	}
63	}
64
65	// Converts embeddings to BSON binary vectors using MongoDB Java Driver
66	private static Document convertEmbeddingsToBson(EmbedByTypeResponseEmbeddings embeddings) {
67	List<List<Double>> floatEmbeddings = embeddings.getFloat().orElseThrow();
68	List<List<Integer>> int8Embeddings = embeddings.getInt8().orElseThrow();
69	List<List<Integer>> ubinaryEmbeddings = embeddings.getUbinary().orElseThrow();
70
71	List<Document> bsonEmbeddings = new ArrayList<>();
72	for (int i = 0; i < floatEmbeddings.size(); i++) {
73	Document bsonEmbedding = new Document()
74	.append("text", DATA.get(i))
75	.append("embeddings_float32", BinaryVector.floatVector(listToFloatArray(floatEmbeddings.get(i))))
76	.append("embeddings_int8", BinaryVector.int8Vector(listToByteArray(int8Embeddings.get(i))))
77	.append("embeddings_int1", BinaryVector.packedBitVector(listToByteArray(ubinaryEmbeddings.get(i)), (byte) 0));
78
79	bsonEmbeddings.add(bsonEmbedding);
80	}
81
82	return new Document("data", bsonEmbeddings);
83	}
84
85	// Writes embeddings to JSON file
86	private static void writeEmbeddingsToFile(Document bsonEmbeddings, String fileName) {
87	try (FileOutputStream fos = new FileOutputStream(fileName)) {
88	fos.write(bsonEmbeddings.toJson().getBytes());
89	System.out.println("Embeddings saved to " + fileName);
90	} catch (IOException e) {
91	System.out.println("Error writing embeddings to file: " + e.getMessage());
92	}
93	}
94
95	// Convert List of Doubles to an array of floats
96	private static float[] listToFloatArray(List<Double> list) {
97	float[] array = new float[list.size()];
98	for (int i = 0; i < list.size(); i++) {
99	array[i] = list.get(i).floatValue();
100	}
101	return array;
102	}
103
104	// Convert List of Integers to an array of bytes
105	private static byte[] listToByteArray(List<Integer> list) {
106	byte[] array = new byte[list.size()];
107	for (int i = 0; i < list.size(); i++) {
108	array[i] = list.get(i).byteValue();
109	}
110	return array;
111	}
112	}

1	import com.mongodb.client.MongoClient;
2	import com.mongodb.client.MongoClients;
3	import com.mongodb.client.MongoCollection;
4	import com.mongodb.client.MongoDatabase;
5	import com.mongodb.client.model.SearchIndexModel;
6	import com.mongodb.client.model.SearchIndexType;
7	import org.bson.Document;
8	import org.bson.conversions.Bson;
9
10	import java.io.IOException;
11	import java.nio.file.Files;
12	import java.nio.file.Path;
13	import java.util.Collections;
14	import java.util.List;
15	import java.util.concurrent.TimeUnit;
16	import java.util.stream.StreamSupport;
17
18	public class UploadDataAndCreateIndex {
19
20	private static final String MONGODB_URI = System.getenv("MONGODB_URI");
21	private static final String DB_NAME = "<DATABASE-NAME>";
22	private static final String COLLECTION_NAME = "<COLLECTION-NAME>";
23	private static final String INDEX_NAME = "<INDEX-NAME>";
24
25	public static void main(String[] args) {
26	try (MongoClient mongoClient = MongoClients.create(MONGODB_URI)) {
27	storeEmbeddings(mongoClient);
28	setupVectorSearchIndex(mongoClient);
29	} catch (IOException \| InterruptedException e) {
30	e.printStackTrace();
31	}
32	}
33
34	public static void storeEmbeddings(MongoClient client) throws IOException {
35	MongoDatabase database = client.getDatabase(DB_NAME);
36	MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
37
38	String fileContent = Files.readString(Path.of("embeddings.json"));
39	List<Document> documents = parseDocuments(fileContent);
40
41	collection.insertMany(documents);
42	System.out.println("Inserted documents into MongoDB");
43	}
44
45	private static List<Document> parseDocuments(String jsonContent) throws IOException {
46	Document rootDoc = Document.parse(jsonContent);
47	return rootDoc.getList("data", Document.class);
48	}
49
50	public static void setupVectorSearchIndex(MongoClient client) throws InterruptedException {
51	MongoDatabase database = client.getDatabase(DB_NAME);
52	MongoCollection<Document> collection = database.getCollection(COLLECTION_NAME);
53
54	Bson definition = new Document(
55	"fields",
56	List.of(
57	new Document("type", "vector")
58	.append("path", "embeddings_float32")
59	.append("numDimensions", 1024)
60	.append("similarity", "dotProduct"),
61	new Document("type", "vector")
62	.append("path", "embeddings_int8")
63	.append("numDimensions", 1024)
64	.append("similarity", "dotProduct"),
65	new Document("type", "vector")
66	.append("path", "embeddings_int1")
67	.append("numDimensions", 1024)
68	.append("similarity", "euclidean")
69	)
70	);
71
72	SearchIndexModel indexModel = new SearchIndexModel(
73	INDEX_NAME,
74	definition,
75	SearchIndexType.vectorSearch()
76	);
77
78	List<String> result = collection.createSearchIndexes(Collections.singletonList(indexModel));
79	System.out.println("Successfully created vector index named: " + result.get(0));
80	System.out.println("It may take up to a minute for the index to leave the BUILDING status and become queryable.");
81
82	System.out.println("Polling to confirm the index has changed from the BUILDING status.");
83	waitForIndex(collection, INDEX_NAME);
84	}
85
86	public static <T> boolean waitForIndex(final MongoCollection<T> collection, final String indexName) {
87	long startTime = System.nanoTime();
88	long timeoutNanos = TimeUnit.SECONDS.toNanos(60);
89	while (System.nanoTime() - startTime < timeoutNanos) {
90	Document indexRecord = StreamSupport.stream(collection.listSearchIndexes().spliterator(), false)
91	.filter(index -> indexName.equals(index.getString("name")))
92	.findAny().orElse(null);
93	if (indexRecord != null) {
94	if ("FAILED".equals(indexRecord.getString("status"))) {
95	throw new RuntimeException("Search index has FAILED status.");
96	}
97	if (indexRecord.getBoolean("queryable")) {
98	System.out.println(indexName + " index is ready to query");
99	return true;
100	}
101	}
102	try {
103	Thread.sleep(100); // busy-wait, avoid in production
104	} catch (InterruptedException e) {
105	Thread.currentThread().interrupt();
106	throw new RuntimeException(e);
107	}
108	}
109	return false;
110	}
111	}

1	// Use 'require' for modules in a Node.js environment
2	const { CohereClient } = require('cohere-ai');
3	const { writeFile } = require('fs/promises');
4	dd:queueMicrotask
5	// Retrieve API key from environment variables or default placeholder
6	const apiKey = process.env.COHERE_API_KEY \|\| '<COHERE-API-KEY>';
7
8	if (!apiKey) {
9	throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10	}
11
12	// Instantiate the CohereClient with the API key
13	const cohere = new CohereClient({ token: apiKey });
14
15	async function main() {
16	try {
17	// Data to embed
18	const data = [
19	"The Great Wall of China is visible from space.",
20	"The Eiffel Tower was completed in Paris in 1889.",
21	"Mount Everest is the highest peak on Earth at 8,848m.",
22	"Shakespeare wrote 37 plays and 154 sonnets during his lifetime.",
23	"The Mona Lisa was painted by Leonardo da Vinci.",
24	];
25
26	// Fetch embeddings for the data using the cohere API
27	const response = await cohere.v2.embed({
28	model: 'embed-english-v3.0',
29	inputType: 'search_document',
30	texts: data,
31	embeddingTypes: ['float', 'int8', 'ubinary'],
32	});
33
34	// Extract embeddings from the API response
35	const { float, int8, ubinary } = response.embeddings;
36
37	// Map the embeddings to the text data
38	const embeddingsData = data.map((text, index) => ({
39	text,
40	embeddings: {
41	float: float[index],
42	int8: int8[index],
43	ubinary: ubinary[index],
44	},
45	}));
46
47	// Write the embeddings data to a JSON file
48	await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
49	console.log('Embeddings saved to embeddings.json');
50	} catch (error) {
51	console.error('Error fetching embeddings:', error);
52	}
53	}
54
55	// Execute the main function
56	main();

1	const fs = require('fs/promises');
2	const { BSON } = require('mongodb');
3	const { Binary } = BSON;
4
5	async function main() {
6	try {
7	// Read and parse the contents of 'embeddings.json' file
8	const fileContent = await fs.readFile('embeddings.json', 'utf8');
9	const embeddingsData = JSON.parse(fileContent);
10
11	// Map the embeddings data to add BSON binary representations with subtype 9
12	const convertEmbeddingsData = embeddingsData.map(({ text, embeddings }) => {
13	// Create Binary for Float32Array with manual subtype 9
14	const bsonFloat32 = Binary.fromFloat32Array(new Float32Array(embeddings.float));
15
16	// Create Binary for Int8Array with subtype 9
17	const bsonInt8 = Binary.fromInt8Array(new Int8Array(embeddings.int8));
18
19	// Create Binary for PackedBits (Uint8Array) with subtype 9
20	const bsonPackedBits = Binary.fromPackedBits(new Uint8Array(embeddings.ubinary));
21
22	return {
23	text,
24	embeddings: {
25	float: embeddings.float, // Original float data
26	int8: embeddings.int8, // Original int8 data
27	ubinary: embeddings.ubinary, // Original packed bits data
28	},
29	bsonEmbeddings: {
30	float32: bsonFloat32,
31	int8: bsonInt8,
32	packedBits: bsonPackedBits,
33	},
34	};
35	});
36
37	// Serialize the updated data to EJSON for BSON compatibility
38	const ejsonSerializedData = BSON.EJSON.stringify(convertEmbeddingsData, null, null, { relaxed: false });
39
40	// Write the serialized data to 'embeddings.json'
41	await fs.writeFile('embeddings.json', ejsonSerializedData);
42	console.log('Embeddings with BSON vectors have been saved to embeddings.json');
43	} catch (error) {
44	console.error('Error processing embeddings:', error);
45	}
46	}
47
48	main();

1	const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2	const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3
4	const { Binary } = BSON; // Ensure the Binary class is imported correctly
5
6	async function main() {
7	const MONGODB_URI = process.env.MONGODB_URI \|\| "<CONNECTION-STRING>";
8	const DB_NAME = "<DB-NAME>";
9	const COLLECTION_NAME = "<COLLECTION-NAME>";
10
11	let client;
12	try {
13	client = new MongoClient(MONGODB_URI);
14	await client.connect();
15	console.log("Connected to MongoDB");
16
17	const db = client.db(DB_NAME);
18	const collection = db.collection(COLLECTION_NAME);
19
20	// Read and parse the contents of 'embeddings.json' file using EJSON
21	const fileContent = await fs.readFile('embeddings.json', 'utf8');
22	const embeddingsData = BSON.EJSON.parse(fileContent);
23
24	// Map embeddings data to recreate BSON binary representations with the correct subtype
25	const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
26	return {
27	text,
28	bsonEmbeddings: {
29	float32: bsonEmbeddings.float32,
30	int8: bsonEmbeddings.int8,
31	int1: bsonEmbeddings.packedBits
32	}
33	};
34	});
35
36	const result = await collection.insertMany(documents);
37	console.log(`Inserted ${result.insertedCount} documents into MongoDB`);
38
39	} catch (error) {
40	console.error('Error storing embeddings in MongoDB:', error);
41	} finally {
42	if (client) {
43	await client.close();
44	}
45	}
46	}
47
48	// Run the store function
49	main();

1	const { MongoClient } = require("mongodb");
2	const { setTimeout } = require("timers/promises"); // Import from timers/promises
3
4	// Connect to your Atlas deployment
5	const uri = process.env.MONGODB_URI \|\| "<CONNECTION-STRING>";
6
7	const client = new MongoClient(uri);
8
9	async function main() {
10	try {
11	const database = client.db("<DB-NAME>");
12	const collection = database.collection("<COLLECTION-NAME>");
13
14	// Define your Atlas Vector Search index
15	const index = {
16	name: "<INDEX-NAME>",
17	type: "vectorSearch",
18	definition: {
19	fields: [
20	{
21	type: "vector",
22	numDimensions: 1024,
23	path: "bsonEmbeddings.float32",
24	similarity: "dotProduct",
25	},
26	{
27	type: "vector",
28	numDimensions: 1024,
29	path: "bsonEmbeddings.int8",
30	similarity: "dotProduct",
31	},
32	{
33	type: "vector",
34	numDimensions: 1024,
35	path: "bsonEmbeddings.int1",
36	similarity: "euclidean",
37	},
38	],
39	},
40	};
41
42	// Run the helper method
43	const result = await collection.createSearchIndex(index);
44	console.log(`New search index named ${result} is building.`);
45
46	// Wait for the index to be ready to query
47	console.log("Polling to check if the index is ready. This may take up to a minute.");
48	let isQueryable = false;
49
50	// Use filtered search for index readiness
51	while (!isQueryable) {
52	const [indexData] = await collection.listSearchIndexes(index.name).toArray();
53
54	if (indexData) {
55	isQueryable = indexData.queryable;
56	if (!isQueryable) {
57	await setTimeout(5000); // Wait for 5 seconds before checking again
58	}
59	} else {
60	// Handle the case where the index might not be found
61	console.log(`Index ${index.name} not found.`);
62	await setTimeout(5000); // Wait for 5 seconds before checking again
63	}
64	}
65
66	console.log(`${result} is ready for querying.`);
67	} catch (error) {
68	console.error("Error:", error);
69	} finally {
70	await client.close();
71	}
72	}
73
74	main().catch((err) => {
75	console.error("Unhandled error:", err);
76	});

1	const { MongoClient } = require('mongodb');
2	const fs = require('fs/promises');
3	const { BSON } = require('bson'); // Use BSON's functionality for EJSON parsing
4	const dotenv = require('dotenv');
5
6	dotenv.config();
7
8	// MongoDB connection details
9	const mongoUri = process.env.MONGODB_URI \|\| '<CONNECTION-STRING>';
10	const dbName = '<DB-NAME>'; // Update with your actual database name
11	const collectionName = '<COLLECTION-NAME>'; // Update with your actual collection name
12
13	// Indices and paths should match your MongoDB vector search configuration
14	const VECTOR_INDEX_NAME = '<INDEX-NAME>'; // Replace with your actual index name
15	const NUM_CANDIDATES = 5; // Number of candidate documents for the search
16	const LIMIT = 2; // Limit for the number of documents to return
17
18	// Fields in the collection that contain the BSON query vectors
19	const FIELDS = [
20	{ path: 'float32', subtype: 9 }, // Ensure that the path and custom subtype match
21	{ path: 'int8', subtype: 9 }, // Use the custom subtype if needed
22	{ path: 'int1', subtype: 9 } // Use the same custom subtype
23	];
24
25
26	// Function to read BSON vectors from JSON and run vector search
27	async function main() {
28	// Initialize MongoDB client
29	const client = new MongoClient(mongoUri);
30
31	try {
32	await client.connect();
33	console.log("Connected to MongoDB");
34
35	const db = client.db(dbName);
36	const collection = db.collection(collectionName);
37
38	// Load query embeddings from JSON file using EJSON parsing
39	const fileContent = await fs.readFile('query-embeddings.json', 'utf8');
40	const embeddingsData = BSON.EJSON.parse(fileContent);
41
42	// Define and run the query for each embedding type
43	const results = {};
44
45	for (const fieldInfo of FIELDS) {
46	const { path, subtype } = fieldInfo;
47	const bsonBinary = embeddingsData[0]?.bsonEmbeddings?.[path];
48
49	if (!bsonBinary) {
50	console.warn(`BSON embedding for ${path} not found in the JSON.`);
51	continue;
52	}
53
54	const bsonQueryVector = bsonBinary; // Directly use BSON Binary object
55
56	const pipeline = [
57	{
58	$vectorSearch: {
59	index: VECTOR_INDEX_NAME,
60	path: `bsonEmbeddings.${path}`,
61	queryVector: bsonQueryVector,
62	numCandidates: NUM_CANDIDATES,
63	limit: LIMIT,
64	}
65	},
66	{
67	$project: {
68	_id: 0,
69	text: 1, // Adjust projection fields as necessary to match your document structure
70	score: { $meta: 'vectorSearchScore' }
71	}
72	}
73	];
74
75	results[path] = await collection.aggregate(pipeline).toArray();
76	}
77
78	return results;
79	} catch (error) {
80	console.error('Error during vector search:', error);
81	} finally {
82	await client.close();
83	}
84	}
85
86	// Main execution block
87	(async () => {
88	try {
89	const results = await main();
90
91	if (results) {
92	console.log("Results from Float32 embeddings:");
93	console.table(results.float32 \|\| []);
94	console.log("--------------------------------------------------------------------------");
95
96	console.log("Results from Int8 embeddings:");
97	console.table(results.int8 \|\| []);
98	console.log("--------------------------------------------------------------------------");
99
100	console.log("Results from Packed Binary (PackedBits) embeddings:");
101	console.table(results.int1 \|\| []);
102	}
103	} catch (error) {
104	console.error('Error executing main function:', error);
105	}
106	})();

1	const { MongoClient } = require('mongodb');
2	const fs = require('fs'); // Import the fs module for file system operations
3
4	async function main() {
5	// Replace with your Atlas connection string
6	const uri = process.env.MONGODB_URI \|\| '<CONNECTION-STRING>';
7
8	// Create a new MongoClient instance
9	const client = new MongoClient(uri);
10
11	try {
12	// Connect to your Atlas cluster
13	await client.connect();
14
15	// Specify the database and collection
16	const db = client.db('sample_airbnb');
17	const collection = db.collection('listingsAndReviews');
18
19	// Filter to exclude null or empty summary fields
20	const filter = { summary: { $nin: [null, ''] } };
21
22	// Get a subset of documents in the collection
23	const documentsCursor = collection.find(filter).limit(50);
24
25	// Convert the cursor to an array to get the documents
26	const documents = await documentsCursor.toArray();
27
28	// Log the documents to verify their content
29	console.log('Documents retrieved:', documents);
30
31	// Write the documents to a local file called "subset.json"
32	const outputFilePath = './subset.json';
33	fs.writeFileSync(outputFilePath, JSON.stringify(documents, null, 2), 'utf-8');
34
35	console.log(`Subset of documents written to: ${outputFilePath}`);
36	} catch (error) {
37	console.error('An error occurred:', error);
38	} finally {
39	// Ensure the client is closed when finished
40	await client.close();
41	}
42	}
43
44	main().catch(console.error);

1	// Import necessary modules using the CommonJS syntax
2	const { CohereClient } = require('cohere-ai');
3	const { readFile, writeFile } = require('fs/promises');
4
5	// Retrieve the API key from environment variables or provide a placeholder
6	const apiKey = process.env.COHERE_API_KEY \|\| '<COHERE-API-KEY>';
7
8	if (!apiKey \|\| apiKey === '<COHERE-API-KEY>') {
9	throw new Error('API key not found. Please set COHERE_API_KEY in your environment.');
10	}
11
12	// Initialize the Cohere client with the API key
13	const cohere = new CohereClient({ token: apiKey });
14
15	async function main() {
16	try {
17	// Read and parse the contents of 'subset.json'
18	const subsetData = await readFile('subset.json', 'utf-8');
19	const documents = JSON.parse(subsetData);
20
21	// Extract the 'summary' fields that are non-empty strings
22	const data = documents
23	.map(doc => doc.summary)
24	.filter(summary => typeof summary === 'string' && summary.length > 0);
25
26	if (data.length === 0) {
27	throw new Error('No valid summary texts available in the data.');
28	}
29
30	// Request embeddings from the Cohere API
31	const response = await cohere.v2.embed({
32	model: 'embed-english-v3.0',
33	inputType: 'search_document',
34	texts: data,
35	embeddingTypes: ['float', 'int8', 'ubinary'],
36	});
37
38	// Extract embeddings from the API response
39	const { float, int8, ubinary } = response.embeddings;
40
41	// Structure the embeddings data
42	const embeddingsData = data.map((text, index) => ({
43	text,
44	embeddings: {
45	float: float[index],
46	int8: int8[index],
47	ubinary: ubinary[index],
48	},
49	}));
50
51	// Write the embeddings data to 'embeddings.json'
52	await writeFile('embeddings.json', JSON.stringify(embeddingsData, null, 2));
53	console.log('Embeddings saved to embeddings.json');
54	} catch (error) {
55	console.error('Error fetching embeddings:', error);
56	}
57	}
58
59	// Execute the main function
60	main();

1	const fs = require('fs/promises'); // Use fs/promises for asynchronous operations
2	const { MongoClient, BSON } = require('mongodb'); // Import from the 'mongodb' package
3	const { EJSON, Binary } = require('bson'); // Import EJSON and Binary from bson
4
5	async function main() {
6	const MONGODB_URI = process.env.MONGODB_URI \|\| "<CONNECTION-STRING>";
7	const DB_NAME = "sample_airbnb";
8	const COLLECTION_NAME = "listingsAndReviews";
9
10	let client;
11	try {
12	// Connect to MongoDB
13	client = new MongoClient(MONGODB_URI);
14	await client.connect();
15	console.log("Connected to MongoDB");
16
17	// Access database and collection
18	const db = client.db(DB_NAME);
19	const collection = db.collection(COLLECTION_NAME);
20
21	// Load embeddings from JSON using EJSON.parse
22	const fileContent = await fs.readFile('embeddings.json', 'utf8');
23	const embeddingsData = EJSON.parse(fileContent); // Use EJSON.parse
24
25	// Map embeddings data to recreate BSON binary representations
26	const documents = embeddingsData.map(({ text, bsonEmbeddings }) => {
27	return {
28	summary: text,
29	bsonEmbeddings: {
30	float32: bsonEmbeddings.float32,
31	int8: bsonEmbeddings.int8,
32	int1: bsonEmbeddings.packedBits
33	}
34	};
35	});
36
37	// Iterate over documents and upsert each into the MongoDB collection
38	for (const doc of documents) {
39	const filter = { summary: doc.summary };
40	const update = { $set: doc };
41
42	// Update the document with the BSON binary data
43	const result = await collection.updateOne(filter, update, { upsert: true });
44	if (result.matchedCount > 0) {
45	console.log(`Updated document with summary: ${doc.summary}`);
46	} else {
47	console.log(`Inserted new document with summary: ${doc.summary}`);
48	}
49	}
50
51	console.log("Embeddings stored in MongoDB successfully.");
52	} catch (error) {
53	console.error('Error storing embeddings in MongoDB:', error);
54	} finally {
55	if (client) {
56	await client.close();
57	}
58	}
59	}
60
61	// Run the main function to load the data
62	main();

1	# Connect to your Atlas cluster
2	mongo_client = pymongo.MongoClient("<ATLAS-CONNECTION-STRING>")
3	db = mongo_client["sample_airbnb"]
4	collection = db["listingsAndReviews"]
5
6	# Filter to exclude null or empty summary fields
7	filter = { "summary": {"$nin": [None, ""]} }
8
9	# Get a subset of documents in the collection
10	documents = collection.find(filter).limit(50)
11
12	# Initialize the count of updated documents
13	updated_doc_count = 0

1	from pymongo.operations import SearchIndexModel
2	import time
3
4	# Define and create the vector search index
5	index_name = "<INDEX-NAME>"
6	search_index_model = SearchIndexModel(
7	definition={
8	"fields": [
9	{
10	"type": "vector",
11	"path": "embedding",
12	"similarity": "euclidean",
13	"numDimensions": 1024
14	}
15	]
16	},
17	name=index_name,
18	type="vectorSearch"
19	)
20	result = collection.create_search_index(model=search_index_model)
21	print("New search index named " + result + " is building.")
22
23	# Wait for initial sync to complete
24	print("Polling to check if the index is ready. This may take up to a minute.")
25	predicate=None
26	if predicate is None:
27	predicate = lambda index: index.get("queryable") is True
28	while True:
29	indices = list(collection.list_search_indexes(index_name))
30	if len(indices) and predicate(indices[0]):
31	break
32	time.sleep(5)
33	print(result + " is ready for querying.")

Quantização vetorial

Observação

Sobre Quantização

Quantização escalar

Quantização binária

Requisitos

Observação

Como habilitar a quantização automática de vetores

Benefícios

Casos de uso

Procedimento

Especifique o tipo de quantização que você deseja no seu índice do Atlas Vector Search .

Crie ou atualize o índice.

Como ingestão de vetores pré-quantizados

Observação

Casos de uso

Benefícios

Drivers suportados

Pré-requisitos

Procedimento

Crie seu projeto Java e instale dependências.

Defina suas variáveis de ambiente.

Observação

Gere incorporações a partir de seus dados.

Faça a ingestão dos dados e crie um índice do Atlas Vector Search .

Crie e execute uma query em relação à coleção.

Crie seu projeto Java e instale dependências.

Defina suas variáveis de ambiente.

Observação

(Condicional) Gere incorporações a partir de seus dados.

Faça a ingestão dos dados e crie um índice do Atlas Vector Search .

Crie e execute uma query na coleção.

Instale as bibliotecas necessárias.

Configure as variáveis de ambiente no seu terminal.

Gere as incorporações vetoriais para seus dados.

Converta as incorporações vetoriais em vetores binData.

Conecte-se ao cluster do Atlas e carregue os dados em uma coleção.

Crie o índice do Atlas Vector Search na coleção.

Gerar as incorporações para o texto da consulta.

Execute uma consulta do Atlas Vector Search.

Instale as bibliotecas necessárias.

Configure as variáveis de ambiente no seu terminal.

Busque os dados em seu cluster do Atlas.

Gere as incorporações vetoriais para seus dados.

Converta as incorporações vetoriais em vetores binData.

Conecte-se ao cluster do Atlas e carregue os dados para o namespace.

Crie o índice do Atlas Vector Search na coleção.

Gerar as incorporações para o texto da consulta.

Execute uma consulta do Atlas Vector Search.

Instale as bibliotecas necessárias.

Exemplo

Instale o PyMongo e o Cohere

Carregue os dados para os quais você deseja gerar vetores BSON no seu notebook.

Exemplo

Dados de amostra para importar

(Condicional) Gere incorporações a partir de seus dados.

Exemplo

Gerar incorporações a partir de dados de amostra usando o Cohere

Gere os vetores BSON a partir de suas incorporações.

Exemplo

Defina e execute uma função para gerar vetores BSON

Crie documentos com incorporações vetoriais BSON .

Exemplo

Crie documentos a partir dos dados de amostra

Carregue seus dados em seu cluster do Atlas.

Exemplo

Exemplo

Crie o índice do Atlas Vector Search na coleção.

Exemplo

Crie um índice para a coleção de amostras

Defina uma função para executar as queries do Atlas Vector Search.

Exemplo

Execute a query do Atlas Vector Search.

Exemplo

Instale as bibliotecas necessárias.

Exemplo

Instale o PyMongo e o Cohere

Defina as funções para gerar incorporações vetoriais e converter incorporações em formato compatível com BSON.

Exemplo

Função para Gerar e Converter Incorporações

Converta as incorporações vetoriais em vetores `binData`.

Converta as incorporações vetoriais em vetores `binData`.