Quantification vectorielle : recherche d’évolutivité et applications d’IA générative

Mai Nguyen and Henry Weller
October 7, 2024 | Updated: January 14, 2025
#genAI #Vector Search

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview:

Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements.

Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search.

View the documentation to get started.

Nous sommes ravis d’annoncer le lancement d’un grand nombre de fonctionnalités avancées de quantification vectorielle dans MongoDB Atlas Vector Search. Elles réduiront la taille des vecteurs tout en préservant les performances. Les développeurs pourront donc créer de puissantes applications de recherche sémantique et d’IA générative à plus grande échelle et à moindre coût. De plus, contrairement aux bases de données vectorielles relationnelles ou de niche, le document model flexible de MongoDB, associé aux vecteurs quantifiés, permet de réaliser des tests plus agiles et de faciliter le déploiement de différents modèles d’intégration.

La prise en charge de l’ingestion de vecteurs quantifiés scalaires est désormais disponible. D’autres nouveautés seront annoncées dans les semaines à venir. Poursuivez votre lecture pour découvrir le fonctionnement de la quantification vectorielle et consultez notre documentation pour commencer !

Les défis des applications vectorielles à grande échelle

Bien que l’utilisation de vecteurs ait donné lieu à de nombreuses possibilités, telles que la synthèse de contenu et l’analyse des sentiments, les chatbots en langage naturel et la génération d’images, l’exploitation de données non structurées peut nécessiter le stockage et la recherche dans des milliards de vecteurs, ce qui devient une tâche difficile.

Les vecteurs sont en fait des tableaux de nombres à virgule flottante. Ils représentent des informations non structurées compréhensibles par les ordinateurs (de quelques centaines à des milliards de tableaux). Plus leur nombre augmente, plus la taille de l’index nécessaire pour effectuer une recherche sur ces vecteurs s’accroît. Par conséquent, les applications vectorielles à grande échelle qui reposent sur des vecteurs de haute fidélité ont souvent des coûts de traitement élevés et des temps de requête lents, ce qui entrave leur évolutivité et leurs performances.

Quantification vectorielle pour maximiser la rentabilité, l’évolutivité et les performances

La quantification vectorielle, une technique qui permet de compresser les vecteurs tout en préservant leur similarité sémantique, permet de résoudre cette problématique. Imaginez convertir une image en couleurs en niveaux de gris pour réduire l’espace de stockage sur un ordinateur. Cette opération implique de simplifier les informations sur les couleurs de chaque pixel en regroupant celles similaires dans des canaux de couleurs primaires ou des « bacs de quantification », puis de représenter chaque pixel par une seule valeur de son bac. Les valeurs compartimentées sont ensuite utilisées pour créer une nouvelle image en niveaux de gris de plus petite taille tout en conservant la plupart des détails d’origine (voir figure 1).

Figure 1 . illustration de la quantification d’une image RGB en niveaux de gris

This image is an illustration of quantizing an RGB image into grayscale. On the left side is a photo of a puppy in normal color. In the middle is that same photo in RGB examples. And then on the right is a grayscale version of the photo.

La quantification vectorielle fonctionne de la même manière. Elle réduit les vecteurs de haute fidélité en un plus petit nombre de bits afin de considérablement diminuer les coûts de mémoire et de stockage tout en conservant les informations essentielles. Maintenir cet équilibre est primordial, car les applications de recherche et d’IA doivent fournir des informations pertinentes pour être utiles.

Les deux méthodes les plus efficaces sont la méthode scalaire (conversion d’un point flottant en un nombre entier) et la méthode binaire (conversion d’un point flottant en un seul bit de 0 ou 1). Les fonctionnalités de quantification actuelles et à venir permettront aux développeurs d’exploiter tout le potentiel d’Atlas Vector Search.

Principal avantage : une évolutivité accrue et des coûts réduits grâce à la diminution des ressources informatiques et au traitement efficace des vecteurs. Associée à Search Nodes, l’infrastructure dédiée de MongoDB pour une évolutivité indépendante grâce à l’isolation des charges de travail et à l’infrastructure optimisée pour la mémoire pour la recherche sémantique et les charges de travail d’IA générative, la quantification vectorielle peut encore réduire les coûts et améliorer les performances. C’est le cas même lorsque le volume et l’évolutivité sont très élevés. Les développeurs peuvent ainsi accéder à un plus grand nombre de cas d’utilisation.

« La société Cohere est ravie d’être l’un des premiers partenaires à soutenir l’ingestion quantifiée de vecteurs dans MongoDB Atlas », a déclaré Nils Reimers, vice-président de la recherche sur l’IA chez Cohere. « Les modèles d’intégration, tels que Cohere Embed v3, aident les entreprises à obtenir des résultats de recherche plus précis en fonction de leurs propres sources de données. Nous avons hâte de fournir à nos clients communs des applications précises et rentables adaptées à leurs besoins. »

Lors de nos tests, par rapport aux vecteurs de haute fidélité, les vecteurs de type BSON (le format de sérialisation binaire de type JSON de MongoDB pour un stockage efficace des documents) ont réduit la taille de stockage de 66 % (de 41 Go à 14 Go). Comme le montrent les figures 2 et 3, les tests affichent une réduction significative de la mémoire (de 73 % à 96 %) et des améliorations de la latence en utilisant des vecteurs quantifiés. La quantification scalaire préserve la performance de rappel. Celle de la quantification binaire est maintenue avec le rescoring, un processus d’évaluation d’un petit sous-ensemble de résultats quantifiés par rapport à des vecteurs de haute fidélité afin d’améliorer la précision des résultats de la recherche.

Figure 2 . réduction significative du stockage et bonnes performances de rappel et de latence avec la quantification sur différents modèles d’intégration

This image is a table displaying storage size and latency times for different amounts of documents and test groups. The test is divided into three groups, which are Full-Fidelity Vectors, Scalar Quantization, and Binary Quantization. Then, there are two different groups for the number of total documents, one being 200k docs on OpenAI embedding models, and the other being 3 million docs on Cohere embedding model. For the data, the full-fidelity vectors test on 200k docs had a vector index size of 1.2 GB and a latency of 13ms, and a 12GB vector index size and 26ms latency on the 3 million docs test. The Scalar Quantization test had a vector index size of .32 GB and 11ms latency on the 200k docs test, and a 3.2 GB vector index size and 19ms latency on the 3 million docs test. Finally, the binary quantization had a .05 GB vector index size on the 200k docs test (a 96% reduction from other tests) along with a 12ms latency, and then a .5 GB vector index size on 3 million docs test, representing a 96% reduction from the Full-Fidelity Vectors test.

Figure 3 . nette amélioration des performances de rappel pour la quantification binaire lorsqu’elle est associée au rescoring

This image is a graph of improvement in recall performance for binary quantization when combining with rescoring. The Y axis of the graph represents average recall over 50 queries, while the X axis represents num candidates. There are 4 lines on the graph, each representing a different type of queries. The line representing binary, in red, starts near 0,0 and stays below 0.6 on the graph across all num candidates, putting it as the lowest line on the graph. The float ANN line, in blue, starts near the top of the Y axis at 0 num candidates and moves in a level line across the graph, same goes for the scalar line, in orange, which comes in just below the float ANN. The binary + rescoring line starts towards the bottom of the Y axis at 0 num candidates, but gradually increases the more the graph moves right.

De plus, grâce à son coût réduit, la quantification vectorielle facilite des cas d’utilisation plus avancés et multiples, dont la mise en œuvre aurait été trop fastidieuse ou trop onéreuse. Elle peut notamment aider les utilisateurs à réaliser les actions suivantes :

procéder à des tests A/B de différents modèles d’intégration en utilisant plusieurs vecteurs produits à partir du même champ source pendant le prototypage. Le document model MongoDB, associé aux vecteurs quantifiés, permet une plus grande agilité à moindre coût. Grâce au schéma flexible du document, les développeurs peuvent déployer et comparer rapidement les résultats des modèles d’intégration sans avoir à reconstruire l’index ou à fournir un modèle de données ou un ensemble d’infrastructures entièrement nouveaux ;
améliorer la pertinence des résultats de recherche ou du contexte pour les grands modèles de langage (LLM) en intégrant des vecteurs provenant de multiples sources pertinentes, telles que différents champs sources (descriptions de produits, images de produits, etc.) intégrés dans le même modèle ou dans des modèles différents.

Comment se lancer ?

Désormais, grâce à la prise en charge de l’ingestion de vecteurs quantifiés scalaires, les développeurs peuvent importer et travailler avec des vecteurs quantifiés provenant des fournisseurs de modèles d’intégration de leur choix (Cohere, Nomic, Jina, Mixedbread, etc.), directement dans Atlas Vector Search. Lisez la documentation et regardez le tutoriel pour commencer.

Dans les semaines à venir, de nouvelles fonctionnalités de quantification vectorielle permettront d’utiliser un ensemble complet d’outils pour créer et optimiser des applications avec des vecteurs quantifiés :

la prise en charge de l’ingestion de vecteurs quantifiés binaires permettra de réduire davantage l’espace de stockage, ce qui se traduira par des économies plus importantes et donnera aux développeurs la possibilité de choisir les vecteurs quantifiés les plus adaptés à leurs besoins ;

la quantification et la rescoring automatiques fourniront des capacités natives pour la quantification scalaire ainsi que la quantification binaire avec rescoring dans Atlas Vector Search. Les développeurs pourront ainsi tirer pleinement parti de la quantification vectorielle au sein de la plateforme.

Avec la prise en charge des vecteurs quantifiés dans MongoDB Atlas Vector Search, vous pouvez créer des applications de recherche sémantique et d’IA générative évolutives, performantes, flexibles et rentables. Consultez ces ressources pour vous lancer.

Consultez notre guide de démarrage rapide pour commencer à utiliser Atlas Vector Search dès aujourd’hui.

← Previous

MongoDB.local London 2024: Better Applications, Faster

This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국어 , 简体中文 . Since we kicked off MongoDB’s series of 2024 events in April, we’ve connected with thousands of customers, partners, and community members in cities around the world—from Mexico City to Mumbai. Yesterday marked the nineteenth stop of the 2024 MongoDB.local tour, and we had a blast welcoming folks across industries to MongoDB.local London, where we discussed the latest technology trends, celebrated customer innovations, and unveiled product updates that make it easier than ever for developers to build next-gen applications. Over the past year, MongoDB’s more than 50,000 customers have been telling us that their needs are changing. They’re increasingly focused on three areas: Helping developers build faster and more efficiently Empowering teams to create AI-powered applications Moving from legacy systems to modern platforms Across these areas, there’s a common need for a solid foundation: each requires a resilient, scalable, secure, and highly performant database. The updates we shared at MongoDB.local London reflect these priorities. MongoDB is committed to ensuring that our products are built to exceed our customers’ most stringent requirements, and that they provide the strongest possible foundation for building a wide range of applications, now and in the future. Indeed, during yesterday’s event, Sahir Azam, MongoDB’s Chief Product Officer, discussed the foundational role data plays in his keynote address. He also shared the latest advancement from our partner ecosystem, an AI solution powered by MongoDB, Amazon Web Services, and Anthropic that makes it easier for customers to deploy gen AI customer care applications. MongoDB 8.0: The best version of MongoDB ever The biggest news at .local London was the general availability of MongoDB 8.0 , which provides significant performance improvements, reduced scaling costs, and adds additional scalability, resilience, and data security capabilities to the world’s most popular document database. Architectural optimizations in MongoDB 8.0 have significantly reduced memory usage and query times, and MongoDB 8.0 has more efficient batch processing capabilities than previous versions. Specifically, MongoDB 8.0 features 36% better read throughput, 56% faster bulk writes, and 20% faster concurrent writes during data replication. In addition, MongoDB 8.0 can handle higher volumes of time series data and can perform complex aggregations more than 200% faster—with lower resource usage and costs. Last (but hardly least!), Queryable Encryption now supports range queries, ensuring data security while enabling powerful analytics. For more on MongoDB.local London’s product announcements—which are designed to accelerate application development, simplify AI innovation, and speed developer upskilling—please read on! Accelerating application development Improved scaling and elasticity on MongoDB Atlas capabilities New enhancements to MongoDB Atlas’s control plane allow customers to scale clusters faster, respond to resource demands in real-time, and optimize performance—all while reducing operational costs. First, our new granular resource provisioning and scaling features—including independent shard scaling and extended storage and IOPS on Azure—allow customers to optimize resources precisely where needed. Second, Atlas customers will experience faster cluster scaling with up to 50% quicker scaling times by scaling clusters in parallel by node type. Finally, MongoDB Atlas users will enjoy more responsive auto-scaling, with a 5X improvement in responsiveness thanks to enhancements in our scaling algorithms and infrastructure. These enhancements are being rolled out to all Atlas customers, who should start seeing benefits immediately. IntelliJ plugin for MongoDB Announced in private preview, the MongoDB for IntelliJ Plugin is designed to functionally enhance the way developers work with MongoDB in IntelliJ IDEA, one of the most popular IDEs among Java developers. The plugin allows enterprise Java developers to write and test Java queries faster, receive proactive performance insights, and reduce runtime errors right in their IDE. By enhancing the database-to-IDE integration, JetBrains and MongoDB have partnered to deliver a seamless experience for their shared user-base and unlock their potential to build modern applications faster. Sign up for the private preview here . MongoDB Copilot Participant for VS Code (Public Preview) Now in public preview, the new MongoDB Participant for GitHub Copilot integrates domain-specific AI capabilities directly with a chat-like experience in the MongoDB Extension for VS Code . The participant is deeply integrated with the MongoDB extension, allowing for the generation of accurate MongoDB queries (and exporting them to application code), describing collection schemas, and answering questions with up-to-date access to MongoDB documentation without requiring the developer to leave their coding environment. These capabilities significantly reduce the need for context switching between domains, enabling developers to stay in their flow and focus on building innovative applications. Multicluster support for the MongoDB Enterprise Kubernetes Operator Ensure high availability, resilience, and scale for MongoDB deployments running in Kubernetes through added support for deploying MongoDB and Ops Manager across multiple Kubernetes clusters. Users now have the ability to deploy ReplicaSets, Sharded Clusters (in public preview), and Ops Manager across local or geographically distributed Kubernetes clusters for greater deployment resilience, flexibility, and disaster recovery. This approach enables multi-site availability, resilience, and scalability within Kubernetes, capabilities that were previously only available outside of Kubernetes for MongoDB. To learn more, check out the documentation . MongoDB Atlas Search and Vector Search are now generally available via the Atlas CLI and Docker The local development experience for MongoDB Atlas is now generally available. Use the MongoDB Atlas CLI and Docker to build with MongoDB Atlas in your preferred local environment, and easily access features like Atlas Search and Atlas Vector Search throughout the entire software development lifecycle. The Atlas CLI provides a unified and familiar terminal-based interface that allows you to deploy and build with MongoDB Atlas in your preferred development environment, locally or in the cloud. If you build with Docker, you can also now use Docker and Docker Compose to easily integrate Atlas in your local and continuous integration environments with the Atlas CLI . Avoid repetitive work by automating the lifecycle of your development and testing environments and focus on building application features with full-text search, AI and semantic search, and more. Simplifying AI innovation Reduce costs and increase scale in Atlas Vector Search We announced vector quantization capabilities in Atlas Vector Search . By reducing memory (by up to 96%) and making vectors faster to retrieve, vector quantization allows customers to build a wide range of AI and search applications at higher scale and lower cost. Generally available now, support for scalar quantized vector ingestion lets customers seamlessly import and work with quantized vectors from their embedding model providers of choice—directly in Atlas Vector Search. Coming soon, additional vector quantization features, including automatic quantization, will equip customers with a comprehensive toolset for building and optimizing large-scale AI and search applications in Atlas Vector Search. Additional integrations with popular AI frameworks Ship your next AI-powered project faster with MongoDB, no matter your framework or LLM of choice. AI technologies are advancing rapidly, making it important to build and scale performant applications quickly, and to use your preferred stack as your requirements and available technologies evolve. MongoDB’s enhanced suite of integrations with LangChain, LlamaIndex, Microsoft Semantic Kernel, AutoGen, Haystack, Spring AI, the ChatGPT Retrieval Plugin, and more make it easier than ever to build the next generation of applications on MongoDB . Advancing developer upskilling New MongoDB Learning Badges Faster to achieve and more targeted than a certification, MongoDB's free Learning Badges show your commitment to continuous learning and to proving your knowledge about a specific topic. Follow the learning path, gain new skills, and get a digital badge to show off on LinkedIn. Check out the two new gen AI learning badges! Building gen AI Apps : Learn to create innovative gen AI apps with Atlas Vector Search, including retrieval-augmented generation (RAG) apps. Deploying and Evaluating gen AI Apps : Take your apps from creation to full deployment, focusing on optimizing performance and evaluating results. Learn more To learn more about MongoDB’s recent product announcements and updates, check out our What’s New product announcements page and all of our blog posts about product updates . Happy building!

October 3, 2024

Next →

Securing Digital Transformation with MongoDB and RegData

Data security and privacy have long been paramount to the financial industry, but they are especially critical for institutions undergoing digital transformations or those implementing new technology. For example, the integration of artificial intelligence (AI) and machine learning (ML) into organizations’ infrastructure and offerings introduces security and privacy complexities, making it all the more essential for financial organizations to safeguard sensitive information while complying with regulations. The consequences of a data breach are extensive and significantly impactful. These incidents have transformed from simple cybersecurity concerns into catalysts for financial losses, reputational harm, legal challenges, regulatory penalties, and a significant decline in consumer trust. Even with an increased focus on data security, organizations must adopt modern data architecture to effectively mitigate these risks. For example, using a database solution like MongoDB with built-in encryption, role-based access control, and audit logging can help organizations safeguard sensitive data and respond proactively to potential vulnerabilities. The challenge of data security in finance Financial institutions face numerous challenges in protecting data integrity during modernization efforts. The increasing sophistication of cyberattacks, coupled with the need to comply with evolving regulations like the General Data Protection Regulation (GDPR) and the Digital Operational Resilience Act (DORA), creates a complex environment for data management. Institutions must also navigate technical sprawl, where diverse applications and data management systems complicate compliance and operational efficiency. Addressing these challenges requires a holistic approach that integrates data protection into the core design of digital transformation initiatives. Financial institutions need to adopt robust data management practices, ensure the encryption of sensitive data, and maintain vigilant cybersecurity measures. Collaboration with trusted third-party vendors, adopting a privacy-first strategy, and complying with global data protection regulations are essential steps toward safeguarding data privacy in this rapidly evolving digital landscape. Discover how the RegData Protection Suite (RPS), built on MongoDB , enables you to balance technological advancement with regulatory requirements. The solution: MongoDB and RegData MongoDB offers unparalleled reliability, scalability, and flexibility, making it an ideal choice for financial services. MongoDB enables financial institutions to combine operational and AI data in a unified interface and can be deployed on-premises with Enterprise Advanced or across any major cloud provider with MongoDB Atlas , multi-cloud, and hybrid cloud when needed. When combined with RegData's Protection Suite (RPS), organizations can effectively tackle the challenges of digital transformation. RPS is a cloud-native application security platform designed to protect sensitive data through advanced techniques such as encryption, anonymization, and tokenization. Figure 1. Simplified architecture of the RPS solution. Key Features of RegData Protection Suite: Core Configuration: Provides services and a user interface to configure the protection of data. RPS Engine: A sophisticated core engine equipped with various data protection tools. This module is the heart of the application and is responsible for all data protection. Consists of encryption, anonymization, tokenization, and pseudonymization RPS Reporting: A vital component focused on data protection oversight. It gathers and analyzes information on the business application activities protected by RPS to generate a range of valuable reports RPS Manager: Provides end-to-end monitoring capabilities for the components of the RPS platform. RPS Integration: RPS seamlessly integrates with various applications, ensuring that sensitive data is protected across diverse environments. The synergy between MongoDB and RegData shines through in practical applications. For instance, a private bank can leverage hybrid cloud deployments to modernize its operations while maintaining data security. By utilizing RPS, the bank can protect sensitive information during cloud migrations and ensure compliance with regulatory requirements. Additionally, as financial institutions explore outsourcing, RPS helps mitigate risks by anonymizing sensitive data, allowing organizations to maintain control over their data even when leveraging external service providers. Embracing a zero-trust approach for gen AI applications With the rise of AI (and particularly gen AI), banks are developing increasingly more AI- and gen AI-powered applications. While on-premise AI/gen AI model development and testing provides a high level of data security and confidentiality, it may not be within the bank’s budget to afford a production-grade GPU compute pool or one that is large enough to offer sufficient scalability and economy of scale. With this dilemma, banks have begun developing models in private clouds and then deploying on the public cloud to leverage its scalability and economy of scale. MongoDB can serve as that unified operational data layer for a variety of data sources, structured, semi-structured, or unstructured that may also come in different forms (eg. tabular, geospatial, network graph, time series, etc.) for the model development, training, fine-tuning and/or testing. When the model is tested and found to be working, it can then be deployed to the public cloud to serve the AI/gen AI applications. The figure below shows the high-level architecture of how a private bank implemented its gen AI application with MongoDB and RPS. Figure 2. Gen AI data flow architecture focused on data protection. The road to modernization As financial institutions navigate the complexities of digital transformation, the partnership between MongoDB and RegData offers a robust solution for securing data. By adopting a comprehensive data protection strategy, organizations can innovate confidently while ensuring compliance with regulatory standards. Embracing these technologies not only enhances data security but also paves the way for a more resilient and agile financial sector. Establishing a robust data architecture with a modern data platform like MongoDB Atlas enables financial institutions to effectively modernize by consolidating and analyzing data in any format in real-time, driving value-added services and features to consumers while ensuring privacy and security concerns are adequately addressed with built-in security controls across all data. Whether managed in a customer environment or through MongoDB Atlas, a fully managed cloud service, MongoDB ensures robust security with features such as authentication (single sign-on and multi-factor authentication), role-based access controls, and comprehensive data encryption. These security measures act as a safeguard for sensitive financial data, mitigating the risk of unauthorized access from external parties and providing organizations with the confidence to embrace AI and ML technologies. Are you prepared to harness these capabilities for your projects or have any questions about this? Then please reach out to us at industry.solutions@mongodb.com or nfo@regdata.ch . You can also take a look at the following resources: RegData & MongoDB: Securing Digital Transformation Streamline Data Control and Compliance with RegData & MongoDB Implementing an Operational Data Layer

January 23, 2025