Mai Nguyen

12 results

Binary Quantization & Rescoring: 96% Less Memory, Faster Search

We are excited to share that several new vector quantization capabilities are now available in public preview in MongoDB Atlas Vector Search : support for binary quantized vector ingestion, automatic scalar quantization, and automatic binary quantization and rescoring. Together with our recently released support for scalar quantized vector ingestion , these capabilities will empower developers to scale semantic search and generative AI applications more cost-effectively. For a primer on vector quantization, check out our previous blog post . Enhanced developer experience with native quantization in Atlas Vector Search Effective quantization methods—specifically scalar and binary quantization—can now be done automatically in Atlas Vector Search. This makes it easier and more cost-effective for developers to use Atlas Vector Search to unlock a wide range of applications, particularly those requiring over a million vectors. With the new “quantization” index definition parameters, developers can choose to use full-fidelity vectors by specifying “none,” or they can quantize vector embeddings by specifying the desired quantization type—”scalar” or “binary” (Figure 1). This native quantization capability supports vector embeddings from any model provider as well as MongoDB’s BinData float32 vector subtype . Figure 1: New index definition parameters for specifying automatic quantization type in Atlas Vector Search Scalar quantization—converting a float point into an integer—is generally used when it's crucial to maintain search accuracy on par with full-precision vectors. Meanwhile, binary quantization—converting a float point into a single bit of 0 or 1—is more suitable for scenarios where storage and memory efficiency are paramount, and a slight reduction in search accuracy is acceptable. If you’re interested in learning more about this process, check out our documentation . Binary quantization with rescoring: Balance cost and accuracy Compared to scalar quantization, binary quantization further reduces memory usage, leading to lower costs and improved scalability—but also a decline in search accuracy. To mitigate this, when “binary” is chosen in the “quantization” index parameter, Atlas Vector Search incorporates an automatic rescoring step, which involves re-ranking a subset of the top binary vector search results using their full-precision counterparts, ensuring that the final search results are highly accurate despite the initial vector compression. Empirical evidence demonstrates that incorporating a rescoring step when working with binary quantized vectors can dramatically enhance search accuracy, as shown in Figure 2 below. Figure 2: Combining binary quantization and rescoring helps retain search accuracy by up to 95% And as Figure 3 shows, in our tests, binary quantization reduced processing memory requirement by 96% while retaining up to 95% search accuracy and improving query performance. Figure 3: Improvements in Atlas Vector Search with the use of vector quantization It’s worth noting that even though the quantized vectors are used for indexing and search, their full-fidelity vectors are still stored on disk to support rescoring. Furthermore, retaining the full-fidelity vectors enables developers to perform exact vector search for experimental, high-precision use cases, such as evaluating the search accuracy of quantized vectors produced by different embedding model providers, as needed. For more on evaluating the accuracy of quantized vectors, please see our documentation . So how can developers make the most of vector quantization? Here are some example use cases that can be made more efficient and scaled effectively with quantized vectors: Massive knowledge bases can be used efficiently and cost-effectively for analysis and insight-oriented use cases, such as content summarization and sentiment analysis. Unstructured data like customer reviews, articles, audio, and videos can be processed and analyzed at a much larger scale, at a lower cost and faster speed. Using quantized vectors can enhance the performance of retrieval-augmented generation (RAG) applications. The efficient processing can support query performance from large knowledge bases, and the cost-effectiveness advantage can enable a more scalable, robust RAG system, which can result in better customer and employee experience. Developers can easily A/B test different embedding models using multiple vectors produced from the same source field during prototyping. MongoDB’s flexible document model lets developers quickly deploy and compare embedding models’ results without the need to rebuild the index or provision an entirely new data model or set of infrastructure. The relevance of search results or context for large language models (LLMs) can be improved by incorporating larger volumes of vectors from multiple sources of relevance, such as different source fields (product descriptions, product images, etc.) embedded within the same or different models. To get started with vector quantization in Atlas Vector Search, see the following developer resources: Documentation: Vector Quantization in Atlas Vector Search Documentation: How to Measure the Accuracy of Your Query Results Tutorial: How to Use Cohere's Quantized Vectors to Build Cost-effective AI Apps With MongoDB

December 12, 2024

Quantification vectorielle : recherche d’évolutivité et applications d’IA générative

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. Nous sommes ravis d’annoncer le lancement d’un grand nombre de fonctionnalités avancées de quantification vectorielle dans MongoDB Atlas Vector Search . Elles réduiront la taille des vecteurs tout en préservant les performances. Les développeurs pourront donc créer de puissantes applications de recherche sémantique et d’IA générative à plus grande échelle et à moindre coût. De plus, contrairement aux bases de données vectorielles relationnelles ou de niche, le document model flexible de MongoDB, associé aux vecteurs quantifiés, permet de réaliser des tests plus agiles et de faciliter le déploiement de différents modèles d’intégration. La prise en charge de l’ingestion de vecteurs quantifiés scalaires est désormais disponible. D’autres nouveautés seront annoncées dans les semaines à venir. Poursuivez votre lecture pour découvrir le fonctionnement de la quantification vectorielle et consultez notre documentation pour commencer ! Les défis des applications vectorielles à grande échelle Bien que l’utilisation de vecteurs ait donné lieu à de nombreuses possibilités , telles que la synthèse de contenu et l’analyse des sentiments, les chatbots en langage naturel et la génération d’images, l’exploitation de données non structurées peut nécessiter le stockage et la recherche dans des milliards de vecteurs, ce qui devient une tâche difficile. Les vecteurs sont en fait des tableaux de nombres à virgule flottante. Ils représentent des informations non structurées compréhensibles par les ordinateurs (de quelques centaines à des milliards de tableaux). Plus leur nombre augmente, plus la taille de l’index nécessaire pour effectuer une recherche sur ces vecteurs s’accroît. Par conséquent, les applications vectorielles à grande échelle qui reposent sur des vecteurs de haute fidélité ont souvent des coûts de traitement élevés et des temps de requête lents, ce qui entrave leur évolutivité et leurs performances. Quantification vectorielle pour maximiser la rentabilité, l’évolutivité et les performances La quantification vectorielle, une technique qui permet de compresser les vecteurs tout en préservant leur similarité sémantique, permet de résoudre cette problématique. Imaginez convertir une image en couleurs en niveaux de gris pour réduire l’espace de stockage sur un ordinateur. Cette opération implique de simplifier les informations sur les couleurs de chaque pixel en regroupant celles similaires dans des canaux de couleurs primaires ou des « bacs de quantification », puis de représenter chaque pixel par une seule valeur de son bac. Les valeurs compartimentées sont ensuite utilisées pour créer une nouvelle image en niveaux de gris de plus petite taille tout en conservant la plupart des détails d’origine (voir figure 1). Figure 1 . illustration de la quantification d’une image RGB en niveaux de gris La quantification vectorielle fonctionne de la même manière. Elle réduit les vecteurs de haute fidélité en un plus petit nombre de bits afin de considérablement diminuer les coûts de mémoire et de stockage tout en conservant les informations essentielles. Maintenir cet équilibre est primordial, car les applications de recherche et d’IA doivent fournir des informations pertinentes pour être utiles. Les deux méthodes les plus efficaces sont la méthode scalaire (conversion d’un point flottant en un nombre entier) et la méthode binaire (conversion d’un point flottant en un seul bit de 0 ou 1). Les fonctionnalités de quantification actuelles et à venir permettront aux développeurs d’exploiter tout le potentiel d’Atlas Vector Search. Principal avantage : une évolutivité accrue et des coûts réduits grâce à la diminution des ressources informatiques et au traitement efficace des vecteurs. Associée à Search Nodes , l’infrastructure dédiée de MongoDB pour une évolutivité indépendante grâce à l’isolation des charges de travail et à l’infrastructure optimisée pour la mémoire pour la recherche sémantique et les charges de travail d’IA générative, la quantification vectorielle peut encore réduire les coûts et améliorer les performances. C’est le cas même lorsque le volume et l’évolutivité sont très élevés. Les développeurs peuvent ainsi accéder à un plus grand nombre de cas d’utilisation. « La société Cohere est ravie d’être l’un des premiers partenaires à soutenir l’ingestion quantifiée de vecteurs dans MongoDB Atlas », a déclaré Nils Reimers, vice-président de la recherche sur l’IA chez Cohere. « Les modèles d’intégration, tels que Cohere Embed v3, aident les entreprises à obtenir des résultats de recherche plus précis en fonction de leurs propres sources de données. Nous avons hâte de fournir à nos clients communs des applications précises et rentables adaptées à leurs besoins. » Lors de nos tests, par rapport aux vecteurs de haute fidélité, les vecteurs de type BSON (le format de sérialisation binaire de type JSON de MongoDB pour un stockage efficace des documents) ont réduit la taille de stockage de 66 % (de 41 Go à 14 Go). Comme le montrent les figures 2 et 3, les tests affichent une réduction significative de la mémoire (de 73 % à 96 %) et des améliorations de la latence en utilisant des vecteurs quantifiés. La quantification scalaire préserve la performance de rappel. Celle de la quantification binaire est maintenue avec le rescoring, un processus d’évaluation d’un petit sous-ensemble de résultats quantifiés par rapport à des vecteurs de haute fidélité afin d’améliorer la précision des résultats de la recherche. Figure 2 . réduction significative du stockage et bonnes performances de rappel et de latence avec la quantification sur différents modèles d’intégration Figure 3 . nette amélioration des performances de rappel pour la quantification binaire lorsqu’elle est associée au rescoring De plus, grâce à son coût réduit, la quantification vectorielle facilite des cas d’utilisation plus avancés et multiples, dont la mise en œuvre aurait été trop fastidieuse ou trop onéreuse. Elle peut notamment aider les utilisateurs à réaliser les actions suivantes : procéder à des tests A/B de différents modèles d’intégration en utilisant plusieurs vecteurs produits à partir du même champ source pendant le prototypage. Le document model MongoDB, associé aux vecteurs quantifiés, permet une plus grande agilité à moindre coût. Grâce au schéma flexible du document, les développeurs peuvent déployer et comparer rapidement les résultats des modèles d’intégration sans avoir à reconstruire l’index ou à fournir un modèle de données ou un ensemble d’infrastructures entièrement nouveaux ; améliorer la pertinence des résultats de recherche ou du contexte pour les grands modèles de langage (LLM) en intégrant des vecteurs provenant de multiples sources pertinentes, telles que différents champs sources (descriptions de produits, images de produits, etc.) intégrés dans le même modèle ou dans des modèles différents. Comment se lancer ? Désormais, grâce à la prise en charge de l’ingestion de vecteurs quantifiés scalaires, les développeurs peuvent importer et travailler avec des vecteurs quantifiés provenant des fournisseurs de modèles d’intégration de leur choix (Cohere, Nomic, Jina, Mixedbread, etc.), directement dans Atlas Vector Search. Lisez la documentation et regardez le tutoriel pour commencer. Dans les semaines à venir, de nouvelles fonctionnalités de quantification vectorielle permettront d’utiliser un ensemble complet d’outils pour créer et optimiser des applications avec des vecteurs quantifiés : la prise en charge de l’ingestion de vecteurs quantifiés binaires permettra de réduire davantage l’espace de stockage, ce qui se traduira par des économies plus importantes et donnera aux développeurs la possibilité de choisir les vecteurs quantifiés les plus adaptés à leurs besoins ; la quantification et la rescoring automatiques fourniront des capacités natives pour la quantification scalaire ainsi que la quantification binaire avec rescoring dans Atlas Vector Search. Les développeurs pourront ainsi tirer pleinement parti de la quantification vectorielle au sein de la plateforme. Avec la prise en charge des vecteurs quantifiés dans MongoDB Atlas Vector Search, vous pouvez créer des applications de recherche sémantique et d’IA générative évolutives, performantes, flexibles et rentables. Consultez ces ressources pour vous lancer . Consultez notre guide de démarrage rapide pour commencer à utiliser Atlas Vector Search dès aujourd’hui.

October 7, 2024

Vector Quantization: come scalare applicazioni di ricerca e di AI Generativa

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. Siamo lieti di annunciare una solida serie di funzionalità di quantizzazione vettoriale in MongoDB Atlas Vector Search . Queste funzionalità ridurranno le dimensioni dei vettori preservando le prestazioni, consentendo agli sviluppatori di creare potenti applicazioni di ricerca semantica e AI generativa con maggiore scalabilità e a un costo inferiore. Inoltre, a differenza dei database vettoriali relazionali o di nicchia, il modello di documento flessibile di MongoDB, abbinato a vettori quantizzati, consente una maggiore agilità nel test e nell'implementazione di diversi modelli di incorporamento in modo rapido e semplice. Il supporto per l'inserimento di vettori quantizzati scalari è ora disponibile a livello generale e sarà seguito da diverse nuove release nelle prossime settimane. Continua a leggere per scoprire come funziona la quantizzazione vettoriale e consulta la nostra documentazione per iniziare! Le sfide delle applicazioni vettoriali su larga scala Sebbene l'uso dei vettori abbia aperto una serie di nuove possibilità , come il riepilogo dei contenuti e l'analisi del sentiment, i chatbot in linguaggio naturale e la generazione di immagini, sbloccare insight all'interno di dati non strutturati può richiedere l'archiviazione e la ricerca tra miliardi di vettori, il che può diventare rapidamente irrealizzabile. I vettori sono effettivamente degli array di numeri in virgola mobile che rappresentano informazioni non strutturate in un modo comprensibile ai computer (da poche centinaia a miliardi di array) e, con l'aumentare del numero di vettori, aumenta anche la dimensione dell'indice necessario per effettuare una ricerca su di essi. Di conseguenza, le applicazioni vettoriali su larga scala che utilizzano vettori a piena fedeltà hanno spesso costi di elaborazione elevati e tempi di interrogazione lenti, che ne ostacolano la scalabilità e le prestazioni. Quantizzazione vettoriale per economicità, scalabilità e prestazioni La quantizzazione vettoriale, una tecnica che comprime i vettori preservandone la somiglianza semantica, offre una soluzione a questa sfida. Immagina di convertire un'immagine a colori in scala di grigi per ridurre lo spazio di archiviazione su un computer. Ciò comporta la semplificazione delle informazioni sul colore di ciascun pixel raggruppando colori simili in canali di colore primari o "intervalli di quantizzazione" e quindi rappresentando ogni pixel con un singolo valore dal suo intervallo. I valori degli intervalli vengono poi utilizzati per creare una nuova immagine in scala di grigi con dimensioni più piccole, ma conservando la maggior parte dei dettagli originali, come mostrato nella Figura 1. Figura 1. Illustrazione della quantizzazione di un'immagine RGB in scala di grigi La quantizzazione vettoriale funziona in modo simile, riducendo i vettori a piena fedeltà in un minor numero di bit per ridurre significativamente i costi di memoria e archiviazione senza compromettere i dettagli importanti. Mantenere questo equilibrio è fondamentale, in quanto le applicazioni di ricerca e AI devono fornire insight pertinenti per essere utili. Due metodi di quantizzazione efficaci sono quello scalare (conversione di un punto float in un numero intero) e quello binario (conversione di un punto float in un singolo bit di 0 o 1). Le funzionalità di quantizzazione attuali e future consentiranno agli sviluppatori di massimizzare il potenziale di Atlas Vector Search. Il vantaggio più importante della quantizzazione vettoriale è l'aumento della scalabilità e il risparmio sui costi, grazie alla riduzione delle risorse di calcolo e all'elaborazione efficiente dei vettori. E quando viene combinata con Search Nodes , l'infrastruttura dedicata di MongoDB per la scalabilità indipendente attraverso l'isolamento del carico di lavoro e l'infrastruttura ottimizzata per la memoria per la ricerca semantica e i carichi di lavoro dell'AI generativa, la quantizzazione vettoriale può ridurre ulteriormente i costi e migliorare le prestazioni, anche al massimo volume e alla massima scalabilità, per sbloccare più casi d'uso. "Cohere è entusiasta di essere uno dei primi partner a supportare l'inserimento di vettori quantizzati in MongoDB Atlas", ha dichiarato Nils Reimers, VP of AI Search di Cohere. "L'incorporamento di modelli, come Cohere Embed v3, aiuta le aziende a visualizzare risultati di ricerca più accurati in base alle proprie fonti di dati. Non vediamo l'ora di fornire ai nostri clienti comuni applicazioni accurate e convenienti per le loro esigenze." Nei nostri test, rispetto ai vettori a piena fedeltà, i vettori di tipo BSON , il formato di serializzazione binaria simile a JSON di MongoDB per un'archiviazione efficiente dei documenti, hanno ridotto le dimensioni di archiviazione del 66% (da 41 GB a 14 GB). E come mostrato nelle Figure 2 e 3, i test illustrano una significativa riduzione della memoria (dal 73% al 96% in meno) e miglioramenti della latency utilizzando vettori quantizzati, dove la quantizzazione scalare preserva le prestazioni di richiamo e le prestazioni di richiamo della quantizzazione binaria vengono mantenute con il rescoring, un processo di valutazione di un piccolo sottoinsieme degli output quantizzati rispetto a vettori a piena fedeltà per migliorare l'accuratezza dei risultati della ricerca. Figura 2: Riduzione significativa dello spazio di archiviazione + buone prestazioni di richiamo e latency con quantizzazione su diversi modelli di incorporamento Figura 3: Notevole miglioramento delle prestazioni di richiamo per la quantizzazione binaria quando combinata con il rescoring Inoltre, grazie al vantaggio del costo ridotto, la quantizzazione vettoriale facilita casi d'uso più avanzati, a vettore multiplo, che sarebbero stati troppo onerosi dal punto di vista computazionale o proibitivi da implementare. Ad esempio, la quantizzazione vettoriale può aiutare gli utenti a: Eseguire facilmente A/B test di diversi modelli di incorporamento utilizzando più vettori prodotti dallo stesso campo sorgente durante la prototipazione. Il modello di documento di MongoDB, abbinato a vettori quantizzati, consente una maggiore agilità a costi inferiori. Lo schema flessibile del documento consente agli sviluppatori di distribuire e confrontare rapidamente i risultati dei modelli di incorporamento senza la necessità di ricostruire l'indice o di effettuare il provisioning di un modello di dati o di un set di infrastrutture completamente nuovo. Migliorare ulteriormente la pertinenza dei risultati di ricerca o del contesto per modelli linguistici di grandi dimensioni (LLM) incorporando vettori da più fonti di pertinenza, come diversi campi sorgente (descrizioni di prodotti, immagini di prodotti, ecc.) incorporati nello stesso modello o in modelli diversi. Come iniziare e cosa succede dopo Ora, con il supporto per l'inserimento di vettori quantizzati scalari, gli sviluppatori possono importare e lavorare con vettori quantizzati dai loro fornitori di modelli di incorporamento preferiti (come Cohere, Nomic, Jina, Mixedbread e altri), direttamente in Atlas Vector Search. Per iniziare, leggi la documentazione e il tutorial . E nelle prossime settimane, ulteriori funzionalità di quantizzazione vettoriale forniranno agli sviluppatori un set di strumenti completo per la creazione e l'ottimizzazione di applicazioni con vettori quantizzati: Il supporto per l'inserimento di vettori quantizzati binari consentirà un'ulteriore riduzione dello spazio di archiviazione, consentendo maggiori risparmi sui costi e offrendo agli sviluppatori la flessibilità di scegliere il tipo di vettori quantizzati più adatto alle loro esigenze. La quantizzazione e il rescoring automatici forniranno funzionalità native per la quantizzazione scalare e la quantizzazione binaria con rescoring in Atlas Vector Search, rendendo più facile per gli sviluppatori sfruttare appieno la quantizzazione vettoriale all'interno della piattaforma. Con il supporto per i vettori quantizzati in MongoDB Atlas Vector Search, puoi creare applicazioni di ricerca semantica e di AI generativa scalabili e ad alte prestazioni con flessibilità ed economicità. Consulta queste risorse per ottenere documentazione e tutorial introduttivi. Consulta la nostra guida rapida per iniziare con Atlas Vector Search oggi stesso.

October 7, 2024

Cuantización vectorial: búsqueda a escala y aplicaciones de IA generativa

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. Nos complace anunciar un sólido conjunto de capacidades de cuantificación vectorial en MongoDB Atlas Vector Search . Estas capacidades reducirán el tamaño de los vectores al tiempo que preservan el rendimiento, lo que permitirá a los desarrolladores crear poderosas aplicaciones de búsqueda semántica e IA generativa con más escala y a un costo menor. Además, a diferencia de las bases de datos vectoriales relacionales o de nicho, el modelo de documentos flexible de MongoDB, junto con los vectores cuantificados, permite una mayor agilidad en las pruebas y la implementación de diferentes modelos de incrustación de forma rápida y sencilla. La compatibilidad con la ingesta de vectores cuantizados escalares ya está disponible con carácter general, y será seguida por varias versiones nuevas en las próximas semanas. Siga leyendo para saber cómo funciona la cuantificación vectorial y visite nuestra documentación para comenzar. Los desafíos de las aplicaciones vectoriales a gran escala Si bien el uso de vectores abrió una gama de nuevas posibilidades , como el resumen de contenido y el análisis de sentimientos, los chatbots de lenguaje natural y la generación de imágenes, desbloquear información dentro de datos no estructurados puede requerir almacenar y buscar en miles de millones de vectores, lo que puede volver inviable rápidamente. Los vectores son efectivamente matrices de números de coma flotante que representan información no estructurada de una manera que las computadoras pueden entender (que van desde unos pocos cientos hasta miles de millones de matrices), y a medida que aumenta el número de vectores, también lo hace el tamaño del índice requerido para buscar en ellos. Como resultado, las aplicaciones basadas en vectores a gran escala que utilizan vectores de fidelidad completa a menudo tienen altos costos de procesamiento y tiempos de consulta lentos, lo que dificulta su escalabilidad y rendimiento. Cuantificación de vectores para la rentabilidad, la escalabilidad y el rendimiento La cuantización vectorial, una técnica que comprime vectores conservando su similitud semántica, ofrece una solución a este desafío. Imagine convertir una imagen a todo color en escala de grises para reducir el espacio de almacenamiento en una computadora. Esto implica simplificar la información de color de cada pixel agrupando colores similares en canales de color primarios o "contenedores de cuantificación" y, a continuación, representar cada pixel con un solo valor de su contenedor. Los valores en conjunto se utilizan para crear una nueva imagen en escala de grises con un tamaño más pequeño pero conservando la mayoría de los detalles originales, como se muestra en la Figura 1. Figura 1: Ilustración de la cuantificación de una imagen RGB en escala de grises La cuantización vectorial funciona de manera similar, reduciendo los vectores de alta fidelidad a menos bits para reducir significativamente los costos de memoria y almacenamiento sin comprometer los detalles importantes. Mantener este equilibrio es fundamental, ya que las aplicaciones de búsqueda e inteligencia artificial deben proporcionar información relevante para ser útiles. Dos métodos de cuantificación efectivos son escalar (convertir un punto flotante en un número entero) y binario (convertir un punto flotante en un solo bit de 0 o 1). Las capacidades de cuantificación actuales y futuras capacitarán a los desarrolladores para maximizar el potencial de Atlas Vector Search. El beneficio más impactante de la cuantificación vectorial es el aumento de la escalabilidad y el ahorro de costos a través de la reducción de los recursos informáticos y el procesamiento eficiente de vectores. Y cuando se combina con los nodos de búsqueda , la infraestructura dedicada de MongoDB para la escalabilidad independiente a través del aislamiento de cargas de trabajo y la infraestructura optimizada para memoria para cargas de trabajo de búsqueda semántica e IA generativa, la cuantificación vectorial puede reducir aún más los costos y mejorar el rendimiento, incluso en el volumen y la escala más altos para desbloquear más casos de uso. "Cohere se complace en ser uno de los primeros socios en apoyar la ingestión cuantificada de vectores en MongoDB Atlas”, dijo Nils Reimers, VP de búsqueda de AI en Cohere. "Los modelos de incrustación, como Cohere Embed v3, ayudan a las compañías a ver resultados de búsqueda más precisos basados en sus propias fuentes de datos. Esperamos poder ofrecer a nuestros clientes conjuntos aplicaciones precisas y rentables para sus necesidades”. En nuestras pruebas, en comparación con los vectores de fidelidad completa, los vectores tipo BSON , el formato de serialización binaria tipo JSON de MongoDB para un almacenamiento eficiente de documentos, redujeron el tamaño del almacenamiento en un 66% (de 41 GB a 14 GB). Y como se muestra en las Figuras 2 y 3, las pruebas ilustran una reducción significativa de memoria (73% a 96% menos) y mejoras de latencia utilizando vectores cuantificados, donde la cuantificación escalar preserva el rendimiento de recuperación y el rendimiento de recuperación de la cuantificación binaria se mantiene con la reclasificación, un proceso de evaluación de un pequeño subconjunto de las salidas cuantificadas frente a vectores de fidelidad completa para mejorar la precisión de los resultados de búsqueda. Figura 2: Reducción significativa del almacenamiento + buen rendimiento de recuperación y latencia con cuantificación en diferentes modelos de incrustación Figura 3: Mejora notable en el rendimiento de recuperación para la cuantificación binaria cuando se combina con la repuntuación Además, gracias al beneficio de costo reducido, la cuantificación de vectores facilita casos de uso de vectores múltiples más avanzados que fueron demasiado exigentes desde el punto de vista computacional o prohibitivos para implementar. Por ejemplo, la cuantificación vectorial puede ayudar a los usuarios a: Pruebe fácilmente diferentes modelos de integración A/B empleando múltiples vectores producidos a partir del mismo campo fuente durante la creación de prototipos. El modelo de documentos de MongoDB, junto con vectores cuantificados, permite una mayor agilidad a menores costos. El esquema de documentos flexible permite a los desarrolladores implementar y comparar rápidamente los resultados de los modelos de incrustación sin necesidad de reconstruir el índice o aprovisionar un modelo de datos o un conjunto de infraestructura completamente nuevos. Mejore aún más la relevancia de los resultados de búsqueda o el contexto para los modelos lingüísticos grandes (LLM) mediante la incorporación de vectores de múltiples fuentes de relevancia, como diferentes campos de origen (descripciones de productos, imágenes de productos, etc.) incrustados en el mismo modelo o en modelos diferentes. Cómo empezar y qué sigue Ahora, gracias a la compatibilidad con la ingesta de vectores cuantizados escalares, los desarrolladores pueden importar y trabajar con vectores cuantificados de los proveedores de modelos de incrustación que prefieran (como Cohere, Nomic, Jina, Mixedbread y otros), directamente en Atlas Vector Search. Lea la documentación y el tutorial para comenzar. Y en las próximas semanas, características adicionales de cuantificación vectorial equiparán a los desarrolladores con un completo conjunto de herramientas para crear y optimizar aplicaciones con vectores cuantificados: El soporte para la ingestión de vectores binarios cuantificados permitirá una mayor reducción del espacio de almacenamiento, lo que permitirá un mayor ahorro de costos y brindará a los desarrolladores la flexibilidad de elegir el tipo de vectores cuantificados que mejor se adapte a sus necesidades. La cuantificación y la reclasificación automáticas proporcionarán capacidades nativas para la cuantificación escalar, así como la cuantificación binaria con la reclasificación en Atlas Vector Search, lo que facilita a los desarrolladores aprovechar al máximo la cuantificación vectorial dentro de la plataforma. Con la compatibilidad con vectores cuantificados en MongoDB Atlas Vector Search, puede crear aplicaciones de búsqueda semántica y de IA generativa escalables y de alto rendimiento con flexibilidad y rentabilidad. Consulte estos recursos para comenzar la documentación y el tutorial . Diríjase a nuestra guía de inicio rápido para comenzar con Atlas Vector Search hoy.

October 7, 2024

Quantização vetorial: pesquisa de escala e aplicativos de IA generativa

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. Estamos muito satisfeitos em anunciar um conjunto robusto de recursos de quantização vetorial no MongoDB Atlas Vector Search . Esses recursos reduzirão o tamanho dos vetores e, ao mesmo tempo, preservarão o desempenho, permitindo que os desenvolvedores criem aplicativos avançados de pesquisa semântica e IA generativa com mais escala - e a um custo menor. Além disso, diferentemente dos bancos de dados vetoriais relacionais ou de nicho, o modelo de documento flexível do MongoDB, associado a vetores quantizados, permite maior agilidade para testar e implementar diferentes modelos de incorporação de forma rápida e fácil. O suporte à ingestão de vetores escalares quantizados já está disponível de forma geral e será seguido por várias novas versões nas próximas semanas. Continue lendo para saber como funciona a quantização de vetores e visite nossa documentação para começar! Os desafios dos aplicativos vetoriais de grande escala Embora o uso de vetores tenha aberto uma série de novas possibilidades , como resumo de conteúdo e análise de sentimentos, chatbots de linguagem natural e geração de imagens, o desbloqueio de insights em dados não estruturados pode exigir o armazenamento e a pesquisa em bilhões de vetores, o que pode se tornar inviável rapidamente. Os vetores são, na verdade, matrizes de números de ponto flutuante que representam informações não estruturadas de uma forma que os computadores possam entender (variando de algumas centenas a bilhões de matrizes) e, à medida que o número de vetores aumenta, também aumenta o tamanho do índice necessário para pesquisá-los. Como resultado, os aplicativos baseados em vetores em grande escala que usam vetores de fidelidade total geralmente têm altos custos de processamento e tempos de consulta lentos, o que prejudica sua escalabilidade e desempenho. Quantização vetorial para redução de custos, escalabilidade e desempenho A quantização de vetores, uma técnica que comprime vetores e, ao mesmo tempo, preserva sua similaridade semântica, oferece uma solução para esse desafio. Considere converter uma imagem totalmente digitalizada em escala de cinza para reduzir o espaço de armazenamento em um computador. Isso envolve a simplificação das informações de cores de cada pixel, agrupando cores semelhantes em canais de cores primárias ou "compartimentos de quantização," e, em seguida, representando cada pixel com um único valor de seu compartimento. Os valores binned são então usados para criar uma nova imagem em escala de cinza com tamanho menor, mas mantendo a maioria dos detalhes originais, conforme mostrado na Figura 1. Imagem 1: Ilustração da quantização de uma imagem GB em escala de cinza A quantização de vetores funciona de forma semelhante, diminuindo os vetores de fidelidade total em menos bits para reduzir significativamente os custos de memória e armazenamento sem comprometer os detalhes importantes. Manter esse equilíbrio é fundamental, pois os aplicativos de pesquisa e AI precisam fornecer insights relevantes para serem úteis. Dois métodos eficazes de quantização são o escalar (conversão de um ponto flutuante em um número inteiro) e o binário (conversão de um ponto flutuante em um único bit de 0 ou 1). Os recursos de quantização atuais e futuros capacitarão os desenvolvedores a maximizar o potencial do Atlas Vector Search. O benefício de maior impacto da quantização vetorial é o aumento da escalabilidade e da redução de custos por meio da redução de recursos de computação e do processamento eficiente de vetores. E quando combinada com o Search Nodes - a infraestrutura dedicada do MongoDB para escalabilidade independente por meio do isolamento da carga de trabalho e da infraestrutura otimizada para memória para pesquisa semântica e cargas de trabalho de IA generativas - a quantização vetorial pode reduzir ainda mais os custos e melhorar o desempenho, mesmo no volume e na escala mais altos, para desbloquear mais casos de uso. "A Cohere está satisfeita por ser um dos primeiros parceiros a apoiar a ingestão de vetores quantizados no MongoDB Atlas", disse Nils Reimer, VP de Search da AI da Cohere. “Modelos de incorporação, como o Cohere Embed v3, ajudam as empresas a ver resultados de pesquisa mais precisos com base em suas próprias fontes de dados. Estamos ansiosos para fornecer a nossos clientes em comum aplicativos precisos e econômicos para suas necessidades.” Em nossos testes, em comparação com os vetores de fidelidade total, os vetores do tipo BSON - o formato de serialização binária semelhante ao JSON do MongoDB para armazenamento eficiente de documentos - reduziram o tamanho do armazenamento em 66% (de 41 GB para 14 GB). E, conforme mostrado nas Figuras 2 e 3, os testes ilustram uma redução significativa de memória (73% a 96% menos) e melhorias de latência usando vetores quantizados, em que a quantização escalar preserva o desempenho de recuperação e o desempenho de recuperação da quantização binária é mantido com a restauração - um processo de avaliação de um pequeno subconjunto das saídas quantizadas em relação a vetores de fidelidade total para melhorar a precisão dos resultados da pesquisa. Figura 2: redução significativa do armazenamento + bom desempenho de recuperação e latência com quantização em diferentes modelos de incorporação Figura 3: Melhoria notável no desempenho de recuperação para quantização binária quando combinada com a reescalonamento Além disso, graças à vantagem de custo reduzido, a quantização vetorial facilita casos de uso de vetores múltiplos mais avançados que teriam sido muito computacionalmente taxativos ou proibitivos em termos de custo para serem implementados. Por exemplo, a quantização vetorial pode ajudar os usuários a: Fazer testes A/B facilmente com diferentes modelos de incorporação usando vários vetores produzidos a partir do mesmo campo de origem durante a criação de protótipos. O modelo de documento do MongoDB, juntamente com vetores quantizados, permite maior agilidade a custos mais baixos. O esquema flexível de documento permite que os desenvolvedores implementem e comparem rapidamente os resultados dos modelos incorporados sem a necessidade de reconstruir o índice ou provisionar um modelo de dados totalmente novo ou um conjunto de infraestruturas. Melhorar ainda mais a relevância dos resultados de pesquisa ou do contexto para modelos de linguagem grandes (LLMs) incorporando vetores de várias fontes de relevância, como diferentes campos de origem (descrições de produtos, imagens de produtos etc.) incorporados no mesmo modelo ou em modelos diferentes. Como começar e o que vem a seguir Agora, com suporte para a ingestão de vetores quantizados escalares, os desenvolvedores podem importar e trabalhar com vetores quantizados de seus fornecedores de modelos de incorporação de escolha (como Cohere, Nomic, Jina, Mixedbread e outros) — diretamente no Atlas Vector Search. Leia a documentação e o tutorial para começar. E, nas próximas semanas, recursos adicionais de quantização de vetores equiparão os desenvolvedores com um conjunto abrangente de ferramentas para criar e otimizar aplicativos com vetores quantizados: O suporte à ingestão de vetores binários quantizados permitirá reduzir ainda mais o espaço de armazenamento, possibilitando maior economia de custos e oferecendo aos desenvolvedores a flexibilidade de escolher o tipo de vetor quantizado que melhor se adapta às suas necessidades. A quantização e a repontuação automáticas fornecerão recursos nativos para quantização escalar, bem como para quantização binária com repontuação no Atlas Vector Search, facilitando para os desenvolvedores aproveitar ao máximo a quantização vetorial dentro da plataforma. Com suporte para vetores quantizados no MongoDB Atlas Vector Search, você pode construir pesquisa semântica escalável e de alto desempenho e aplicativos de IA generativa com flexibilidade e relação custo-eficiência. Confira estes recursos para começar a documentação e o tutorial . Acesse nosso guia de início rápido para começar a usar o Atlas Vector Search hoje mesmo.

October 7, 2024

Vector Quantization: Scale Search & Generative AI Applications

This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국어 , 简体中文 . Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. We are excited to announce a robust set of vector quantization capabilities in MongoDB Atlas Vector Search . These capabilities will reduce vector sizes while preserving performance, enabling developers to build powerful semantic search and generative AI applications with more scale—and at a lower cost. In addition, unlike relational or niche vector databases, MongoDB’s flexible document model—coupled with quantized vectors—allows for greater agility in testing and deploying different embedding models quickly and easily. Support for scalar quantized vector ingestion is now generally available, and will be followed by several new releases in the coming weeks. Read on to learn how vector quantization works and visit our documentation to get started! The challenges of large-scale vector applications While the use of vectors has opened up a range of new possibilities , such as content summarization and sentiment analysis, natural language chatbots, and image generation, unlocking insights within unstructured data can require storing and searching through billions of vectors—which can quickly become infeasible. Vectors are effectively arrays of floating-point numbers representing unstructured information in a way that computers can understand (ranging from a few hundred to billions of arrays), and as the number of vectors increases, so does the index size required to search over them. As a result, large-scale vector-based applications using full-fidelity vectors often have high processing costs and slow query times, hindering their scalability and performance. Vector quantization for cost-effectiveness, scalability, and performance Vector quantization, a technique that compresses vectors while preserving their semantic similarity, offers a solution to this challenge. Imagine converting a full-color image into grayscale to reduce storage space on a computer. This involves simplifying each pixel's color information by grouping similar colors into primary color channels or "quantization bins," and then representing each pixel with a single value from its bin. The binned values are then used to create a new grayscale image with smaller size but retaining most original details, as shown in Figure 1. Figure 1: Illustration of quantizing an RGB image into grayscale Vector quantization works similarly, by shrinking full-fidelity vectors into fewer bits to significantly reduce memory and storage costs without compromising the important details. Maintaining this balance is critical, as search and AI applications need to deliver relevant insights to be useful. Two effective quantization methods are scalar (converting a float point into an integer) and binary (converting a float point into a single bit of 0 or 1). Current and upcoming quantization capabilities will empower developers to maximize the potential of Atlas Vector Search. The most impactful benefit of vector quantization is increased scalability and cost savings through reduced computing resources and efficient processing of vectors. And when combined with Search Nodes —MongoDB’s dedicated infrastructure for independent scalability through workload isolation and memory-optimized infrastructure for semantic search and generative AI workloads— vector quantization can further reduce costs and improve performance, even at the highest volume and scale to unlock more use cases. "Cohere is excited to be one of the first partners to support quantized vector ingestion in MongoDB Atlas,” said Nils Reimers, VP of AI Search at Cohere. “Embedding models, such as Cohere Embed v3, help enterprises see more accurate search results based on their own data sources. We’re looking forward to providing our joint customers with accurate, cost-effective applications for their needs.” In our tests, compared to full-fidelity vectors, BSON-type vectors —MongoDB’s JSON-like binary serialization format for efficient document storage—reduced storage size by 66% (from 41 GB to 14 GB). And as shown in Figures 2 and 3, the tests illustrate significant memory reduction (73% to 96% less) and latency improvements using quantized vectors, where scalar quantization preserves recall performance and binary quantization’s recall performance is maintained with rescoring–a process of evaluating a small subset of the quantized outputs against full-fidelity vectors to improve the accuracy of the search results. Figure 2: Significant storage reduction + good recall and latency performance with quantization on different embedding models Figure 3: Remarkable improvement in recall performance for binary quantization when combining with rescoring In addition, thanks to the reduced cost advantage, vector quantization facilitates more advanced, multiple vector use cases that would have been too computationally-taxing or cost-prohibitive to implement. For example, vector quantization can help users: Easily A/B test different embedding models using multiple vectors produced from the same source field during prototyping. MongoDB’s document model —coupled with quantized vectors—allows for greater agility at lower costs. The flexible document schema lets developers quickly deploy and compare embedding models’ results without the need to rebuild the index or provision an entirely new data model or set of infrastructure. Further improve the relevance of search results or context for large language models (LLMs) by incorporating vectors from multiple sources of relevance, such as different source fields (product descriptions, product images, etc.) embedded within the same or different models. How to get started, and what’s next Now, with support for the ingestion of scalar quantized vectors, developers can import and work with quantized vectors from their embedding model providers of choice (such as Cohere, Nomic, Jina, Mixedbread, and others)—directly in Atlas Vector Search. Read the documentation and tutorial to get started. And in the coming weeks, additional vector quantization features will equip developers with a comprehensive toolset for building and optimizing applications with quantized vectors: Support for ingestion of binary quantized vectors will enable further reduction of storage space, allowing for greater cost savings and giving developers the flexibility to choose the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring will provide native capabilities for scalar quantization as well as binary quantization with rescoring in Atlas Vector Search, making it easier for developers to take full advantage of vector quantization within the platform. With support for quantized vectors in MongoDB Atlas Vector Search, you can build scalable and high-performing semantic search and generative AI applications with flexibility and cost-effectiveness. Check out these resources to get started documentation and tutorial . Head over to our quick-start guide to get started with Atlas Vector Search today.

October 7, 2024

向量量化：扩展搜索和生成式人工智能应用程序

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. 我们很高兴地宣布 MongoDB Atlas Vector Search 将提供一组强大的向量量化功能。这些功能在保持性能的同时还将减小向量大小，使开发者能够以更大的规模和更低的成本构建强大的语义搜索和生成式人工智能应用程序。此外，与关系型或生态位向量数据库不同，MongoDB 灵活的文档模型与量化向量相结合，可以轻松快捷地测试和部署不同嵌入模型，同时提高灵活性。对标量量化向量注入的支持现已普遍推出，未来几周还将发布几个新版本。继续阅读以了解向量量化的工作原理，访问我们的文档即可开始！大规模向量应用程序的挑战虽然向量的使用开辟了一系列新的可能性，如内容摘要和情感分析、自然语言聊天机器人和图像生成，但要从非结构化数据中获得洞察，可能需要存储和搜索数十亿个向量，这很快就会变得不可行。向量实际上是浮点数数组，以计算机可以理解的方式表示非结构化信息（从几百到数十亿数组不等），随着向量数量的增加，搜索向量所需的索引大小也随之增加。因此，使用全保真向量的大规模向量应用程序通常具有较高的处理成本，并且查询速度慢，从而影响了其可扩展性和性能。向量量化可提升成本效益、可扩展性和性能向量量化是一种可保留语义相似性的向量压缩技术，为这一挑战提供了解决方案。想象一下，将全彩图像转换为灰度图像，就能减少计算机上的存储空间。这需要将相似的颜色归入原色通道或“量化区间”，以简化每个像素的颜色信息，然后用其区间中的单个值来表示每个像素。然后使用已划分区间的值创建新的灰度图像，新图像的尺寸更小，但保留了大部分原始细节，如图 1 所示。图 1：将 RGB 图像量化为灰度图像的示意图向量量化的工作原理与此类似，缩小全保真向量的位数可以显著降低内存和存储成本，而不会影响重要细节。保持这种平衡至关重要，因为搜索和 AI 应用程序需要提供相关的洞察才能发挥作用。有效的量化方法有两种：标量量化（将浮点转换为整数）和二进制量化（将浮点转换为一位 0 或 1）。现有的和即将推出的量化功能将助力开发者充分挖掘 Atlas Vector Search 的潜力。向量量化最显著的优势是通过减少计算资源和高效处理向量提升了可扩展性并节省了成本。与搜索节点（MongoDB 的专用基础架构，可通过工作负载隔离性实现独立可扩展性，针对语义搜索和生成式人工智能工作负载进行了内存优化）相结合时，向量量化可进一步降低成本并提高性能，即使在最大容量和规模下也能解锁更多使用案例。 "Cohere 很高兴成为首批支持 MongoDB Atlas 量化向量注入的合作伙伴之一，”Cohere 人工智能搜索副总裁 Nils Reimers 表示。“像 Cohere Embed v3 这样的嵌入模型可帮助企业根据自己的数据源查看更准确的搜索结果。我们期待为我们的共同客户提供准确、经济实惠的应用程序，以满足他们的需求。” 在我们的测试中，与全保真向量相比， BSON 型向量（MongoDB 的类 JSON 二进制序列化格式，用于高效文档存储）将存储空间减少了 66%（从 41 GB 减少到 14 GB）。如图 2 和图 3 所示，测试表明，使用量化向量可以显著减少内存（减少 73% 到 96%），延迟也有所改善，其中标量量化保留了召回性能，二进制量化的召回性能通过重新评分来维持（重新评分是根据全保真向量对一小部分量化输出进行评估的过程，可提高搜索结果的准确性）。图 2：通过不同嵌入模型上的量化，存储空间显著减少，召回和延迟性能良好图 3：与重新评分相结合时，二进制量化的召回性能显著提高此外，由于成本方面的优势，向量量化有利于实现更先进的多向量使用案例，这类使用案例由于计算负担太重或成本太高而难以实现。例如，向量量化可以帮助用户：在原型设计期间，使用从同一源字段生成的多个向量，轻松地对不同嵌入模型进行 A/B 测试。MongoDB 的文档模型与量化向量相结合，能够以更低的成本实现更高的灵活性。灵活的文档模式支持开发者快速部署和比较嵌入模型的结果，而无需重建索引或预配全新的数据模型或基础架构。通过合并来自多个相关源的向量，例如嵌入在相同或不同模型中的不同源字段（产品描述、产品图像等），进一步提高大型语言模型 (LLM) 搜索结果或上下文的相关性。如何开始，以及下一步现在，凭借对标量量化向量注入的支持，开发者可以直接在 Atlas Vector Search 中导入和使用量化向量，这些量化向量来自他们所选择的嵌入模型提供商（如 Cohere、Nomic、Jina、Mixedbread 等）。阅读文档和教程即可开始。未来几周还会推出其他向量量化功能，开发者可借助这套全面的工具集，使用量化向量来构建和优化应用程序：支持注入二进制量化向量，将进一步减少存储空间，从而节省更多成本，开发者能够灵活选择最符合其要求的量化向量类型。自动量化和重新评分将为标量量化提供原生功能，以及在 Atlas Vector Search 中通过重新评分进行二进制量化的功能，开发者可以更轻松地充分利用平台中的向量量化功能。 MongoDB Atlas Vector Search 支持量化向量，您可以灵活构建可扩展的高性能语义搜索和生成式人工智能应用程序，并实现成本效益。查看这些资源获取入门文档和教程。立即查看我们的快速入门指南，开始使用 Atlas Vector Search。

October 7, 2024

Vektorquantisierung: Scale-Suche und Generative-KI-Anwendungen

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. Wir freuen uns, einen robusten Satz von Vektorquantisierungsfunktionen in MongoDB Atlas Vector Search ankündigen zu können. Diese Funktionen reduzieren die Vektorgrößen bei gleichbleibender Leistung und ermöglichen Entwicklern die Erstellung leistungsstarker Anwendungen für semantische Suche und Generative KI in größerem Maßstab – und zu geringeren Kosten. Darüber hinaus ermöglicht das flexible Dokumentmodell von MongoDB – gekoppelt mit quantisierten Vektoren – im Gegensatz zu relationalen oder Nischen-Vektordatenbanken eine größere Flexibilität beim schnellen und einfachen Testen und Bereitstellen verschiedener Einbettungsmodelle. Die Unterstützung für die Aufnahme skalarer quantisierter Vektoren ist jetzt allgemein verfügbar und wird in den kommenden Wochen durch mehrere neue Versionen ergänzt. Lesen Sie weiter, um zu erfahren, wie die Vektorquantisierung funktioniert, und besuchen Sie unsere Dokumentation , um loszulegen! Die Herausforderungen großer Vektoranwendungen Während die Verwendung von Vektoren eine Reihe neuer Möglichkeiten eröffnet hat, wie Inhaltszusammenfassung und Stimmungsanalyse, Chatbots mit natürlicher Sprache und Bilderzeugung, kann das Gewinnen von Erkenntnissen aus unstrukturierten Daten das Speichern und Durchsuchen von Milliarden von Vektoren erfordern, was schnell unpraktikabel werden kann. Vektoren sind im Grunde Arrays von Gleitkommazahlen, die unstrukturierte Informationen auf eine für Computer verständliche Weise darstellen (die Bandbreite reicht von einigen Hundert bis hin zu Milliarden von Arrays). Und mit der Anzahl der Vektoren steigt auch die Indexgröße, die für die Suche in ihnen erforderlich ist. Infolgedessen haben große vektorbasierte Anwendungen, die Full-Fidelity-Vektoren verwenden, oft hohe Verarbeitungskosten und langsame Abfragezeiten, was ihre Skalierbarkeit und Leistung beeinträchtigt. Vektorquantisierung für Kosteneffizienz, Skalierbarkeit und Leistung Die Vektorquantisierung, eine Technik, die Vektoren komprimiert und dabei ihre semantische Ähnlichkeit beibehält, bietet eine Lösung für diese Herausforderung. Stellen Sie sich vor, ein Vollfarbbild wird in Graustufen umgewandelt, um Speicherplatz auf einem Computer zu sparen. Dabei werden die Farbinformationen jedes Pixels vereinfacht, indem ähnliche Farben in Primärfarbkanäle oder „Quantisierungs-Bins“ gruppiert und dann jeder Pixel mit einem einzelnen Wert aus seinem Bin dargestellt wird. Die in Bins eingeteilten Werte werden dann verwendet, um ein neues Graustufenbild mit kleinerer Größe zu erstellen, bei dem jedoch die meisten ursprünglichen Details erhalten bleiben (siehe Abbildung 1). Abbildung 1. Darstellung der Quantisierung eines RGB-Bildes in Graustufen Die Vektorquantisierung funktioniert ähnlich, indem sie Full-Fidelity-Vektoren auf weniger Bits verkleinert, um die Speicher- und Speicherkosten erheblich zu reduzieren, ohne die wichtigen Details zu beeinträchtigen. Die Aufrechterhaltung dieses Gleichgewichts ist von entscheidender Bedeutung, da Such- und KI-Anwendungen relevante Erkenntnisse liefern müssen, um nützlich zu sein. Zwei effektive Quantisierungsmethoden sind skalar (Umwandeln einer Gleitkommazahl in eine Ganzzahl) und binär (Umwandeln einer Gleitkommazahl in ein einzelnes Bit von 0 oder 1). Aktuelle und zukünftige Quantisierungsfunktionen ermöglichen Entwicklern, das Potenzial von Atlas Vector Search optimal zu nutzen. Die wirkungsvollsten Vorteile der Vektorquantisierung sind die erhöhte Skalierbarkeit und Kosteneinsparungen durch reduzierte Rechenressourcen und eine effiziente Verarbeitung von Vektoren. Und in Kombination mit Search Nodes – der dedizierten Infrastruktur von MongoDB für unabhängige Skalierbarkeit durch Workload-Isolierung und speicheroptimierte Infrastruktur für Semantische-Such- und Generative-KI-Workloads – kann die Vektorquantisierung die Kosten weiter senken und die Leistung verbessern, selbst bei höchstem Volumen und Skalierung, um mehr Anwendungsfälle zu erschließen. „Cohere freut sich, einer der ersten Partner zu sein, der die Aufnahme quantisierter Vektoren in MongoDB Atlas unterstützt“, sagte Nils Reimers, VP of AI Search bei Cohere. „Einbettungsmodelle wie Cohere Embed v3 helfen Unternehmen, genauere Suchergebnisse auf der Grundlage ihrer eigenen Datenquellen zu erhalten. Wir freuen uns darauf, unseren gemeinsamen Kunden präzise und kostengünstige Anwendungen für ihre Anforderungen bereitzustellen.“ In unseren Tests reduzierten BSON-Vektoren – MongoDBs JSON-ähnliches binäres Serialisierungsformat für eine effiziente Dokumentenspeicherung – die Speichergröße im Vergleich zu Vektoren mit voller Genauigkeit um 66 % (von 41 GB auf 14 GB). Wie aus den Abbildungen 2 und 3 hervorgeht, zeigen die Tests eine erhebliche Verringerung des Speicherbedarfs (73 % bis 96 % weniger) und eine Verbesserung der Latenzzeit durch quantisierte Vektoren, wobei die Abrufleistung bei skalarer Quantisierung erhalten bleibt und die Abrufleistung bei binärer Quantisierung durch Neubewertung beibehalten wird – ein Prozess, bei dem eine kleine Teilmenge der quantisierten Ausgaben gegen Vektoren mit voller Genauigkeit bewertet wird, um die Genauigkeit der Suchergebnisse zu verbessern. Abbildung 2: Signifikante Speicherreduzierung und gute Recall- sowie Latenzleistung durch Quantisierung bei verschiedenen Einbettungsmodellen Abbildung 3: Bemerkenswerte Verbesserung der Recall-Leistung bei der binären Quantisierung durch Kombination mit Rescoring Darüber hinaus ermöglicht die Vektorquantisierung dank der geringeren Kosten fortschrittlichere Anwendungsfälle mit mehreren Vektoren, deren Implementierung zu rechenintensiv oder zu kostspielig gewesen wäre. Die Vektorquantisierung kann Benutzern beispielsweise bei Folgendem helfen: Führen Sie beim Prototyping problemlos A/B-Tests verschiedener Einbettungsmodelle durch, indem Sie mehrere Vektoren verwenden, die aus demselben Quellfeld erstellt wurden. Das Dokumentmodell von MongoDB ermöglicht – gekoppelt mit quantisierten Vektoren – mehr Agilität bei geringeren Kosten. Das flexible Dokumentschema ermöglicht Entwicklern eine schnelle Bereitstellung und den Vergleich von Ergebnissen eingebetteter Modelle, ohne den Index neu erstellen oder ein völlig neues Datenmodell bzw. eine neue Infrastruktur bereitstellen zu müssen. Verbessern Sie die Relevanz von Suchergebnissen oder Kontext für Large Language Models (LLMs) weiter, indem Sie Vektoren aus mehreren relevanten Quellen integrieren, z. B. verschiedene Quellfelder (Produktbeschreibungen, Produktbilder usw.), die in dasselbe oder in verschiedene Modelle eingebettet sind. Erste Schritte und weitere Schritte Dank der Unterstützung für die Aufnahme skalarer quantisierter Vektoren können Entwickler jetzt quantisierte Vektoren von den Einbettungsmodellanbietern ihrer Wahl (wie Cohere, Nomic, Jina, Mixedbread und anderen) importieren und damit arbeiten – direkt in Atlas Vector Search. Lesen Sie die Dokumentation und das Tutorial , um loszulegen. Und in den kommenden Wochen werden zusätzliche Vektorquantisierungsfunktionen Entwicklern ein umfassendes Toolset für die Erstellung und Optimierung von Anwendungen mit quantisierten Vektoren an die Hand geben: Durch die Unterstützung der Aufnahme binärer quantisierter Vektoren lässt sich der Speicherplatz weiter reduzieren, was zu größeren Kosteneinsparungen führt und Entwicklern die Flexibilität gibt, den Typ quantisierter Vektoren auszuwählen, der ihren Anforderungen am besten entspricht. Automatische Quantisierung und Neubewertung bieten native Funktionen für skalare Quantisierung sowie binäre Quantisierung mit Neubewertung in Atlas Vector Search, was es Entwicklern erleichtert, die Vektorquantisierung innerhalb der Plattform voll auszunutzen. Mit der Unterstützung für quantisierte Vektoren in MongoDB Atlas Vector Search können Sie skalierbare und leistungsstarke Semantische-Such- und Generative-KI-Anwendungen flexibel und kostengünstig erstellen. Schauen Sie sich diese Ressourcen an, um mit der Dokumentation und dem Tutorial zu beginnen. Schauen Sie sich unsere Kurzanleitung an, um noch heute mit Atlas Vector Search zu beginnen.

October 7, 2024

벡터 양자화: 대규모 검색 및 생성형 인공지능 애플리케이션

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview: Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements. Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search. View the documentation to get started. MongoDB Atlas Vector Search 에 강력한 벡터 양자화 기능이 추가되었음을 발표하게 되어 기쁩니다. 이러한 기능은 성능을 유지하면서 벡터 크기를 줄여 개발자가 더 큰 규모와 더 낮은 비용으로 강력한 시맨틱 검색 및 생성형 인공지능 애플리케이션을 구축할 수 있도록 지원합니다. 또한 관계형 데이터베이스나 특정 벡터 데이터베이스와 달리 MongoDB의 유연한 문서 모델과 양자화된 벡터를 결합하면 다양한 임베딩 모델을 더욱 빠르고 쉽게 테스트하고 배포할 수 있습니다. 스칼라 양자화 벡터 수집 지원이 정식으로 제공되며, 향후 몇 주 내에 몇 가지 새로운 릴리스가 이어질 예정입니다. 벡터 양자화의 작동 방식을 알아보려면 계속 읽어보세요. 시작하려면 MongoDB 문서를 참조하세요 ! 대규모 벡터 애플리케이션의 과제 벡터를 사용하면 콘텐츠 요약, 감정 분석, 자연어 챗봇, 이미지 생성과 같은 다양한 새로운 가능성 이 열립니다. 하지만 비정형 데이터에서 인사이트를 도출하려면 수십억 개의 벡터를 저장하고 검색해야 하는 경우가 발생합니다. 이는 곧 큰 어려움에 직면할 수 있습니다. 벡터는 컴퓨터가 이해할 수 있는 방식으로 비정형 정보를 나타내는 부동 소수점 숫자 배열(수백 개에서 수십억 개의 배열)이며, 벡터의 수가 증가함에 따라 이들을 검색하는 데 필요한 인덱스 크기도 증가합니다. 대규모 벡터 기반 애플리케이션에서 고정밀 벡터를 사용하면 처리 비용이 높아지고 쿼리 시간이 느려질 수 있습니다. 이는 확장성과 성능 저하로 이어지는 경우가 많습니다. 비용 효율성, 확장성 및 성능 향상을 위한 벡터 양자화 시맨틱 유사성을 유지하면서 벡터를 압축하는 기술인 벡터 양자화는 이러한 문제에 대한 해결책을 제시합니다. 컴퓨터의 저장 공간을 줄이기 위해 풀컬러 이미지를 흑백 이미지로 변환하는 것을 생각해 보세요. 이 과정에는 유사한 색상을 기본 색상 채널 또는 "양자화 구간"으로 그룹화하여 각 픽셀의 색상 정보를 단순화한 다음 각 픽셀을 해당 구간의 단일 값으로 표현하는 작업이 포함됩니다. 그런 다음 구간 값을 사용하여 크기는 더 작지만 원본 세부 정보의 대부분을 유지하는 새로운 흑백 이미지를 만듭니다(그림 1 참조). 그림 1: RGB 이미지를 흑백으로 양자화하는 예시 벡터 양자화도 마찬가지로 고정밀 벡터를 더 적은 비트로 축소하여 중요한 세부 정보를 손상시키지 않고 메모리 및 스토리지 비용을 크게 절감합니다. 검색 및 AI 애플리케이션이 유용하려면 관련 있는 인사이트를 제공해야 하므로 이러한 균형을 유지하는 것이 매우 중요합니다. 두 가지 효과적인 양자화 방법은 스칼라(부동 소수점을 정수로 변환)와 이진(부동 소수점을 0 또는 1의 단일 비트로 변환)입니다. 현재 및 향후 제공될 양자화 기능을 통해 개발자는 Atlas Vector Search의 잠재력을 최대한 활용할 수 있습니다. 벡터 양자화의 가장 큰 이점은 컴퓨팅 리소스 감소 및 효율적인 벡터 처리를 통해 확장성이 향상되고 비용이 절감된다는 것입니다. MongoDB의 검색 노드 는 워크로드 격리 및 메모리 최적화 인프라를 통해 독립적인 확장성을 제공하는 전용 인프라입니다. 시맨틱 검색과 생성형 인공지능 워크로드에 최적화된 검색 노드와 벡터 양자화를 결합하면 최대 볼륨 및 규모에서도 비용을 더욱 절감하고 성능을 향상시켜 더 많은 사용 사례를 창출할 수 있습니다. Cohere의 AI 검색 담당 VP인 Nils Reimers는 "Cohere는 MongoDB Atlas에서 양자화된 벡터 수집을 지원하는 최초의 파트너 중 하나가 되어 기쁩니다."라고 말했습니다. "Cohere Embed v3와 같은 임베딩 모델은 기업이 자체 데이터 소스를 기반으로 더욱 정확한 검색 결과를 얻을 수 있도록 지원합니다. 양사 고객에게 필요에 맞는 정확하고 비용 효율적인 애플리케이션을 제공할 수 있기를 기대합니다." 테스트에서 고정밀 벡터와 비교했을 때 BSON 유형 벡터 (효율적인 문서 저장을 위한 MongoDB의 JSON 유사 이진 직렬화 형식)는 저장 용량을 66%(41GB에서 14GB로) 줄였습니다. 그림 2와 3에서 볼 수 있듯이, 양자화된 벡터를 사용한 테스트 결과 메모리 사용량이 73%~96% 감소하고 지연 시간이 크게 개선되었습니다. 특히, 스칼라 양자화는 재현율 성능을 유지하며, 이진 양자화는 리스코어링을 통해 재현율 성능을 유지합니다. 리스코어링은 검색 결과의 정확도를 높이기 위해 양자화된 출력의 일부를 고정밀 벡터와 비교하여 평가하는 프로세스입니다. 그림 2: 다양한 임베딩 모델에 양자화를 적용하여 스토리지 사용량을 크게 줄이면서도 우수한 재현율과 지연 시간 성능을 유지 그림 3: 리스코어링과 결합 시 이진 양자화의 재현율 성능이 크게 향상됨 또한 비용 절감 효과 덕분에 벡터 양자화는 이전에는 컴퓨팅 리소스가 너무 많이 필요하거나 비용이 많이 들어 구현하기 어려웠던 고급 다중 벡터 사용 사례를 더 쉽게 구현할 수 있도록 지원합니다. 예를 들어, 벡터 양자화는 사용자가 다음을 수행하는 데 도움이 될 수 있습니다. 프로토타이핑 중 동일한 소스 필드에서 생성된 여러 벡터를 사용하여 다양한 임베딩 모델을 쉽게 A/B 테스트할 수 있습니다. MongoDB 의 문서 모델 과 양자화된 벡터를 결합하면 더 낮은 비용으로 민첩성을 향상시킬 수 있습니다. 유연한 문서 스키마를 통해 개발자는 인덱스를 다시 빌드하거나 완전히 새로운 데이터 모델이나 인프라 세트를 프로비저닝하지 않고도 임베딩 모델 결과를 신속하게 배포하고 비교할 수 있습니다. 다양한 관련성 소스(예: 제품 설명, 제품 이미지 등)에서 추출한 벡터를 동일한 또는 다른 모델에 통합하면 대규모 언어 모델(LLM)의 검색 결과 또는 컨텍스트 관련성을 더욱 향상시킬 수 있습니다. 시작 방법 및 향후 계획 이제 스칼라 양자화 벡터 수집이 지원되므로 개발자는 원하는 임베딩 모델 제공업체(Cohere, Nomic, Jina, Mixedbread 등)의 양자화된 벡터를 Atlas Vector Search에서 직접 가져와서 사용할 수 있습니다. 시작하 려면 문서 와 튜토리얼을 참조하세요 . 그리고 향후 몇 주 내에 추가 벡터 양자화 기능이 제공되어 개발자는 양자화된 벡터를 사용하여 애플리케이션을 구축하고 최적화하는 데 필요한 포괄적인 툴 세트를 갖추게 될 것입니다. 이진 양자화 벡터 수집 지원을 통해 저장 공간을 더욱 줄일 수 있으므로 비용을 더 절감하고 개발자는 요구 사항에 가장 적합한 유형의 양자화된 벡터를 유연하게 선택할 수 있습니다. Atlas Vector Search는 자동 양자화 및 리스코어링 기능을 통해 스칼라 양자화와 리스코어링을 사용한 이진 양자화를 기본적으로 지원합니다. 이를 통해 개발자는 플랫폼에서 벡터 양자화를 더욱 쉽게 활용할 수 있습니다. MongoDB Atlas Vector Search는 양자화된 벡터를 지원합니다. 이를 통해 확장성이 뛰어나고 비용 효율적인 고성능 시맨틱 검색 및 생성형 AI 애플리케이션을 유연하게 구축할 수 있습니다. 시작하 려면 문서 및 튜토리얼 리소스를 참조하세요 . 지금 바로 Atlas Vector Search 를 시작하려면 빠른 시작 가이드를 확인하세요.

October 7, 2024

Top Use Cases for Text, Vector, and Hybrid Search

Search is how we discover new things. Whether you’re looking for a pair of new shoes, the latest medical advice, or insights into corporate data, search provides the means to unlock the truth. Search habits—and the accompanying end-user expectations—have evolved along with changes to the search experiences offered by consumer apps like Google and Amazon. The days of the standard of 10 blue links may well be behind us, as new paradigms like vector search and generative AI (gen AI) have upended long-held search norms. But are all forms of search created equal, or should we be seeking out the right “flavor” of search for specific jobs? In this blog post, we will define and dig into various flavors of search, including text, vector and AI-powered search, and hybrid search, and discuss when to use each, including sample use cases where one type of search might be superior to others. Information retrieval revolutionized with text search The concept of text search has been baked into user behavior from the early days of the web, with the rudimentary text box entry and 10 blue link results based on text relevance to the initial query. This behavior and associated business model has produced trillions in revenue and has become one of the fiercest battlegrounds across the internet . Text search allows users to quickly find specific information within a large set of data by entering keywords or phrases. When a query is entered, the text search engine scans through indexed documents to locate and retrieve the most relevant results based on the keywords. Text search is a good solution for queries requiring exact matches where the overarching meaning isn't as critical. Some of the most common uses include: Catalog and content search: Using the search bar to find specific products or content based on keywords from customer inquiries. For example, a customer searching for "size 10 men trainers" or “installation guide” can instantly find the exact items they’re looking for, like how Nextar tapped into Atlas Search to enable physical retailers to create online catalogs. In-application search: This is well-suited for organizations with straightforward offerings to make it easier for users to locate key resources, but that don’t require advanced features like semantic retrieval or contextual re-ranking. For instance, if a user searches for "songs key of G," they can quickly receive relevant materials. This streamlines asset retrieval, allowing users to focus on the task they are trying to achieve and boosts overall satisfaction. For a company like Yousician , Atlas Search enabled their 20 million monthly active users to tackle their music lessons with ease. Customer 360: Unifying data from different sources to create a single, holistic view. Consolidated information such as user preferences, purchase history, and interaction data can be used to enhance business visibility and simplify the management, retrieval, and aggregation of user data. Consider a support agent searching for all information related to customer “John Doe." They can quickly access relevant attributes and interaction history, ensuring more accurate and efficient service. Helvetia was able to achieve success after migrating to MongoDB and using Atlas Search to deliver a single, 360-degree real-time view across all customer touchpoints and insurance products. AI and a new paradigm with vector search With advances in technology, vector search has emerged to help solve the challenge of providing relevant results even when the user may not know what they’re looking for. Vector search allows you to take any type of media or content, convert it into a vector using machine learning algorithms, and then search to find results similar to the target term. The similarity aspect is determined by converting your data into numerical high-dimensional vectors, and then calculating the distance between them to determine relevance—the closer the vector, the higher the relevance. There is a wide range of practical, powerful use cases powered by vector search—notably semantic search and retrieval-augmented generation (RAG) for gen AI. Semantic search focuses on meaning and prioritizes user intent by deciphering not just what users type but why they're searching, in order to provide more accurate and context-oriented search results. Some examples of semantic search include: Content/knowledge base search: Vast amounts of organizational data, structured and unstructured, with hidden insights, can benefit significantly from semantic search. Questions like “What’s our remote work policy?” can return accurate results even when the source materials do not contain the “remote” keyword, but rather have “return to office” or “hybrid” or other keywords. A real-world example of content search is the National Film and Sound Archive of Australia , which uses Atlas Vector Search to power semantic search across petabytes of text, audio, and visual content in its collections. Recommendation engines: Understanding users’ interests and intent is a strong competitive advantage–like how Netflix provides a personalized selection of shows and movies based on your watch history, or how Amazon recommends products based on your purchase history. This is particularly powerful in e-commerce, media & entertainment, financial services, and product/service-oriented industries where the customer experience tightly influences the bottom line. A success story is Delivery Hero , which leverages vector search-powered real-time recommendations to increase customer satisfaction and revenue. Anomaly detection: Identifying and preventing fraud, security breaches, and other system anomalies is paramount for all organizations. By grouping similar vectors and using vector search to identify outliers, potential threats can be detected early, enabling timely responses. Companies like VISO TRUST and Extrac are among the innovators that build their core offerings using semantic search for security and risk management. With the rise of large language models (LLMs), vector search is increasingly becoming essential in gen AI application development. It augments LLMs by providing domain-specific context outside of what the LLMs “know,” ensuring relevance and accuracy of the gen AI output. In this case, the semantic search outputs are used to enhance RAG. By providing relevant information from a vector database, vector search helps the RAG model generate responses that are more contextually relevant. By grounding the generated text in factual information, vector search helps reduce hallucinations and improve the accuracy of the response. Some common RAG applications are for chatbots and virtual assistants, which provide users with relevant responses and carry out tasks based on the user query, delivering enhanced user experience. Two real-world examples of such chatbot implementations are from our customers Okta and Kovai . Another popular application is using RAG to help generate content like articles, blog posts, scripts, code, and more, based on user prompts or data. This significantly accelerates content production, allowing organizations including Novo Nordisk and Scalestack to save time and produce content at scale, all at an accuracy level that was not possible without RAG. Beyond RAG, an emerging vector search usage is in agentic systems . Such a system is an architecture encompassing one or more AI agents with autonomous decision-making capabilities, able to access and use various system components and resources to achieve defined objectives while adapting to environmental feedback. Vector search enables efficient and semantically meaningful information retrieval in these systems, facilitating relevant context for LLMs, optimized tool selection, semantic understanding, and improved relevance ranking. Hybrid search: The best of both worlds Hybrid search combines the strengths of text search with the advanced capabilities of vector search to deliver more accurate and relevant search results. Hybrid search shines in scenarios where there's a need for both precision (where text search excels) and recall (where vector search excels), and where user queries can vary from simple to complex, including both keyword and natural language queries. Hybrid search delivers a more comprehensive, flexible information retrieval process, helping RAG models access a wider range of relevant information. For example, in a customer support context, hybrid search can ensure that the RAG model retrieves not only documents containing exact keywords but also semantically similar content, resulting in more informative and helpful responses. Hybrid search can also help reduce information overload by prioritizing the most relevant results. This allows RAG models to focus on processing and understanding the most critical information, leading to faster, more accurate responses, and improving the user experience. Powering your AI and search applications with MongoDB As your organization continues to innovate in the rapidly evolving technology ecosystem, building robust AI and search applications supporting customer, employee, and stakeholder experiences can deliver powerful competitive advantages. With MongoDB, you can efficiently deploy full-text search , vector search , and hybrid search capabilities. Start building today—simplify your developer experience while increasing impact in MongoDB’s fully-managed, secure vector database, integrated with a vast AI partner ecosystem , including all major cloud providers, generative AI model providers, and system integrators. Head over to our quick-start guide to get started with Atlas Vector Search today.

September 16, 2024

Find Hidden Insights in Vector Databases: Semantic Clustering

Vector databases, a powerful class of databases designed to optimize the storage, processing, and retrieval of large volume, multi-dimensional data, have increasingly been instrumental to generative AI (gen AI) applications, with Forrester predicted a 200% increase in the adoption of vector databases in 2024. But their power extends far beyond these applications. Semantic vector clustering, a technique within vector databases, can unlock hidden knowledge within your organization’s data, democratizing insights across teams. View the tutorial to get started. Mining diverse data for hidden knowledge Imagine your organization’s data as a library of diverse knowledge—a treasure trove of information waiting to be unearthed. Traditionally, uncovering valuable insights from data often relied on asking the right questions, which can be a challenge for developers, data scientists, and business leaders alike. They might spend vast amounts of time sifting through limited, siloed datasets, potentially missing hidden gems buried within the organization's vast data troves. Simply put, without knowing the right questions to ask, these valuable insights often remain undiscovered, leading to missed opportunities or losses. Enter vector databases and semantic vector clustering. A vector database is designed to store and manage unstructured data efficiently. Within a vector database, semantic vector clustering is a technique for organizing information by grouping vectors with similar meaning together. Text analysis, sentiment analysis, knowledge classification, and uncovering semantic connections between data sets—these are just a few examples of how semantic vector clustering empowers organizations to vastly improve data mining. Semantic vector clustering offers a multifaceted approach to organizational improvement. By analyzing text data, it can illuminate customer and employee sentiments, behaviors, and preferences, informing strategic decisions, enhancing customer service, and optimizing employee satisfaction. Furthermore, it revolutionizes knowledge management by categorizing information into easily accessible clusters, thereby boosting collaboration and efficiency. Finally, by bridging data silos and uncovering hidden relationships, semantic vector clustering facilitates informed decision-making and breaks down organizational barriers. For example, the business can gain significant insights from its customer interaction data which is routinely kept, classified, or summarized. Those data points (texts, numbers, images, videos, etc.) can be vectorized and semantic vector clustering applied to identify the most prominent customer patterns (the densest vector clusters) from those interactions, classifications, or summaries. From the identified patterns, the business can take further actions or make more informed decisions that they wouldn’t have been able to do otherwise. The power of semantic vector clustering So, how does semantic vector clustering achieve all this? Discover semantic structures: Clustering groups similar LLM-embedded vector sets together. This allows for fast retrieval of themes. Beyond clustering regular vectors (individual data points or concepts), clustering RAG vectors (summarization of themes and concepts) can provide superior LLM contexts compared to basic semantic search. Reduce data complexity via clustering: Data points are grouped based on overall similarity, effectively reducing the complexity of the data. This reveals patterns and summarizes key features, making it easier to grasp the bigger picture. Imagine organizing the library by theme or genre, making it easier to navigate vast amounts of information. Semantic auto-aggregation: Here is the coolest part. We can classify groups of vectors into hierarchies by effectively semantically "auto-aggregating" them. This means that the data itself “figures out” these groups and "self-organizes." Imagine a library with an efficient automated catalog system, allowing researchers to find what they need quickly and easily. Vector clustering can be used to create hierarchies, essentially "auto-aggregating" groups of vectors semantically. Think of it as automatically organizing sections of the library based on thematic connections without a set of pre-built questions. This allows you to identify patterns within a vast, semantically-diverse data within your organization. Unlock hidden insights in your vector database The semantic clustering of vector embeddings is a powerful tool to go beyond the surface of data and identify meanings that otherwise would not have been discovered. By unlocking hidden relationships and patterns, you can extract valuable insights that drive better decision-making, enhance customer experiences, and improve overall business efficiency—all enabled through MongoDB’ secure, unified, and fully-managed vector database capabilities. Check out our tutorial to learn how to get started. Head over to our quick-start guide to get started with Atlas Vector Search today. Add vector search to your arsenal for more accurate and cost-efficient RAG applications by enrolling in the MongoDB and DeepLearning.AI course " Prompt Compression and Query Optimization " for free today.

August 19, 2024

Exact Nearest Neighbor Vector Search for Precise Retrieval

With its ability to efficiently handle high-dimensional, unstructured data, vector search delivers relevant results even when users don’t know what they’re looking for and uses machine learning models to find similar results across any data type. Rapidly emerging as a key technology for modern applications, vector search empowers developers to build next-generation search and generative AI applications faster and easier. MongoDB Atlas Vector Search goes beyond the approximate nearest neighbor (ANN) methods with the introduction of exact nearest neighbor (ENN) vector search . This innovative capability guarantees retrieval of the absolute closest vectors to your query, eliminating the accuracy limitations inherent in ANN. In sum, ENN vector search can help you unleash a new level of precision for your search and generative AI applications, improving benchmarking and moving to production faster. When exact nearest neighbor (ENN) vector search benefits developers While ANN shines in searching across large datasets, ENN vector search offers advantages in specific scenarios: Small-scale vector data: For datasets under 10,000 vectors, the linear time complexity of ENN vector search makes it a viable option, especially considering the added development complexity of tuning ANN parameters. Recall benchmarking of ANN queries: ANN queries are fast, particularly as the scale of your indexed vectors increases, but it may not be easy to know whether the retrieved documents by vector relevance correspond to the guaranteed closest vectors in your index. Using ENN can help provide that exact result set for comparison with your approximate result set, using jaccard similarity or other rank-aware recall metrics. This will allow you to have much greater confidence that your ANN queries are accurate since you can build quantitative benchmarks as your data evolves. Multi-tenant architectures: Imagine a scenario with millions of vectors categorized by tenants. You might search for the closest vectors within a specific tenant (identified by a tenant ID). In cases where the overall vector collection is large (in the millions) but the number of vectors per tenant is small (a few thousand), ANN's accuracy suffers when applying highly selective filters. ENN vector search thrives in this multi-tenant scenario, delivering precise results even with small result sets. Example use cases The small dataset size allows for exhaustive search within a reasonable timeframe, making exact nearest neighbor approach a viable option for finding the most similar data point, improving accuracy confidence in a number of use cases, such as: Multi-tenant data service: You might be building a business providing an agentic service that understands your customers’ data and takes actions on their behalf. When retrieving relevant proprietary data for that agent, it is critical that the right metadata filter be applied and that ENN be executed to retrieve the right sets of documents only corresponding to the appropriate data tenant IDs. Proof of concept development: For instance, a new recommendation engine might have a limited library compared to established ones. Here, ENN vector search can be used to recommend products to a small set of early adopters. Since the data is limited, an exhaustive search becomes practical, ensuring the user gets the most relevant recommendations from the available options. How ENN vector search works on MongoDB Atlas The ENN vector search feature in Atlas integrates seamlessly with the existing $vectorSearch stage within your Atlas aggregation pipelines. Its key characteristics include: Guaranteed accuracy: Unlike ANN, ENN always returns the closest vectors to your query, adhering to the specified limit. Eventual consistency: Similar to approximate vector search, ENN vector search follows an eventual consistency model. Simplified configuration: Unlike approximate vector search, where tuning numCandidates is crucial, ENN vector search only requires specifying the desired limit of returned vectors. Scalable recall evaluation: Atlas allows querying a large number of indexed vectors, facilitating the calculation of comprehensive recall sets for effective evaluation. Fast query execution: ENN vector search query execution can maintain sub-second latency for unfiltered queries up to 10,000 documents. It can also provide low-latency responses for highly selective filters that restrict a broad set of documents into 10,000 documents or less, ordered by vector relevance. Build more with ENN vector search ENN vector search can be a powerful tool when building a proof of concept for retrieval-augmented generation (RAG), semantic search, or recommendation systems powered by vector search. It simplifies the developer experience by minimizing overhead complexity and latency while giving you the flexibility to implement and benchmark precise retrieval. Explore more use cases and build applications faster, start experimenting with ENN vector search. Head over to our quick-start guide to get started with Atlas Vector Search today.

June 20, 2024