Vektorquantisierung: Scale-Suche und Generative-KI-Anwendungen

Mai Nguyen and Henry Weller
October 7, 2024 | Updated: January 14, 2025
#genAI #Vector Search

Update 12/12/2024: The upcoming vector quantization capabilities mentioned at the end of this blog post are now available in public preview:

Support for ingestion and indexing of binary (int1) quantized vectors: gives developers the flexibility to choose and ingest the type of quantized vectors that best fits their requirements.

Automatic quantization and rescoring: provides a native mechanism for scalar quantization and binary quantization with rescoring, making it easier for developers to implement vector quantization entirely within Atlas Vector Search.

View the documentation to get started.

Wir freuen uns, einen robusten Satz von Vektorquantisierungsfunktionen in MongoDB Atlas Vector Search ankündigen zu können. Diese Funktionen reduzieren die Vektorgrößen bei gleichbleibender Leistung und ermöglichen Entwicklern die Erstellung leistungsstarker Anwendungen für semantische Suche und Generative KI in größerem Maßstab – und zu geringeren Kosten. Darüber hinaus ermöglicht das flexible Dokumentmodell von MongoDB – gekoppelt mit quantisierten Vektoren – im Gegensatz zu relationalen oder Nischen-Vektordatenbanken eine größere Flexibilität beim schnellen und einfachen Testen und Bereitstellen verschiedener Einbettungsmodelle.

Die Unterstützung für die Aufnahme skalarer quantisierter Vektoren ist jetzt allgemein verfügbar und wird in den kommenden Wochen durch mehrere neue Versionen ergänzt. Lesen Sie weiter, um zu erfahren, wie die Vektorquantisierung funktioniert, und besuchen Sie unsere Dokumentation, um loszulegen!

Die Herausforderungen großer Vektoranwendungen

Während die Verwendung von Vektoren eine Reihe neuer Möglichkeiten eröffnet hat, wie Inhaltszusammenfassung und Stimmungsanalyse, Chatbots mit natürlicher Sprache und Bilderzeugung, kann das Gewinnen von Erkenntnissen aus unstrukturierten Daten das Speichern und Durchsuchen von Milliarden von Vektoren erfordern, was schnell unpraktikabel werden kann.

Vektoren sind im Grunde Arrays von Gleitkommazahlen, die unstrukturierte Informationen auf eine für Computer verständliche Weise darstellen (die Bandbreite reicht von einigen Hundert bis hin zu Milliarden von Arrays). Und mit der Anzahl der Vektoren steigt auch die Indexgröße, die für die Suche in ihnen erforderlich ist. Infolgedessen haben große vektorbasierte Anwendungen, die Full-Fidelity-Vektoren verwenden, oft hohe Verarbeitungskosten und langsame Abfragezeiten, was ihre Skalierbarkeit und Leistung beeinträchtigt.

Vektorquantisierung für Kosteneffizienz, Skalierbarkeit und Leistung

Die Vektorquantisierung, eine Technik, die Vektoren komprimiert und dabei ihre semantische Ähnlichkeit beibehält, bietet eine Lösung für diese Herausforderung. Stellen Sie sich vor, ein Vollfarbbild wird in Graustufen umgewandelt, um Speicherplatz auf einem Computer zu sparen. Dabei werden die Farbinformationen jedes Pixels vereinfacht, indem ähnliche Farben in Primärfarbkanäle oder „Quantisierungs-Bins“ gruppiert und dann jeder Pixel mit einem einzelnen Wert aus seinem Bin dargestellt wird. Die in Bins eingeteilten Werte werden dann verwendet, um ein neues Graustufenbild mit kleinerer Größe zu erstellen, bei dem jedoch die meisten ursprünglichen Details erhalten bleiben (siehe Abbildung 1).

Abbildung 1. Darstellung der Quantisierung eines RGB-Bildes in Graustufen

This image is an illustration of quantizing an RGB image into grayscale. On the left side is a photo of a puppy in normal color. In the middle is that same photo in RGB examples. And then on the right is a grayscale version of the photo.

Die Vektorquantisierung funktioniert ähnlich, indem sie Full-Fidelity-Vektoren auf weniger Bits verkleinert, um die Speicher- und Speicherkosten erheblich zu reduzieren, ohne die wichtigen Details zu beeinträchtigen. Die Aufrechterhaltung dieses Gleichgewichts ist von entscheidender Bedeutung, da Such- und KI-Anwendungen relevante Erkenntnisse liefern müssen, um nützlich zu sein.

Zwei effektive Quantisierungsmethoden sind skalar (Umwandeln einer Gleitkommazahl in eine Ganzzahl) und binär (Umwandeln einer Gleitkommazahl in ein einzelnes Bit von 0 oder 1). Aktuelle und zukünftige Quantisierungsfunktionen ermöglichen Entwicklern, das Potenzial von Atlas Vector Search optimal zu nutzen.

Die wirkungsvollsten Vorteile der Vektorquantisierung sind die erhöhte Skalierbarkeit und Kosteneinsparungen durch reduzierte Rechenressourcen und eine effiziente Verarbeitung von Vektoren. Und in Kombination mit Search Nodes – der dedizierten Infrastruktur von MongoDB für unabhängige Skalierbarkeit durch Workload-Isolierung und speicheroptimierte Infrastruktur für Semantische-Such- und Generative-KI-Workloads – kann die Vektorquantisierung die Kosten weiter senken und die Leistung verbessern, selbst bei höchstem Volumen und Skalierung, um mehr Anwendungsfälle zu erschließen.

„Cohere freut sich, einer der ersten Partner zu sein, der die Aufnahme quantisierter Vektoren in MongoDB Atlas unterstützt“, sagte Nils Reimers, VP of AI Search bei Cohere. „Einbettungsmodelle wie Cohere Embed v3 helfen Unternehmen, genauere Suchergebnisse auf der Grundlage ihrer eigenen Datenquellen zu erhalten. Wir freuen uns darauf, unseren gemeinsamen Kunden präzise und kostengünstige Anwendungen für ihre Anforderungen bereitzustellen.“

In unseren Tests reduzierten BSON-Vektoren – MongoDBs JSON-ähnliches binäres Serialisierungsformat für eine effiziente Dokumentenspeicherung – die Speichergröße im Vergleich zu Vektoren mit voller Genauigkeit um 66 % (von 41 GB auf 14 GB). Wie aus den Abbildungen 2 und 3 hervorgeht, zeigen die Tests eine erhebliche Verringerung des Speicherbedarfs (73 % bis 96 % weniger) und eine Verbesserung der Latenzzeit durch quantisierte Vektoren, wobei die Abrufleistung bei skalarer Quantisierung erhalten bleibt und die Abrufleistung bei binärer Quantisierung durch Neubewertung beibehalten wird – ein Prozess, bei dem eine kleine Teilmenge der quantisierten Ausgaben gegen Vektoren mit voller Genauigkeit bewertet wird, um die Genauigkeit der Suchergebnisse zu verbessern.

Abbildung 2: Signifikante Speicherreduzierung und gute Recall- sowie Latenzleistung durch Quantisierung bei verschiedenen Einbettungsmodellen

This image is a table displaying storage size and latency times for different amounts of documents and test groups. The test is divided into three groups, which are Full-Fidelity Vectors, Scalar Quantization, and Binary Quantization. Then, there are two different groups for the number of total documents, one being 200k docs on OpenAI embedding models, and the other being 3 million docs on Cohere embedding model. For the data, the full-fidelity vectors test on 200k docs had a vector index size of 1.2 GB and a latency of 13ms, and a 12GB vector index size and 26ms latency on the 3 million docs test. The Scalar Quantization test had a vector index size of .32 GB and 11ms latency on the 200k docs test, and a 3.2 GB vector index size and 19ms latency on the 3 million docs test. Finally, the binary quantization had a .05 GB vector index size on the 200k docs test (a 96% reduction from other tests) along with a 12ms latency, and then a .5 GB vector index size on 3 million docs test, representing a 96% reduction from the Full-Fidelity Vectors test.

Abbildung 3: Bemerkenswerte Verbesserung der Recall-Leistung bei der binären Quantisierung durch Kombination mit Rescoring

This image is a graph of improvement in recall performance for binary quantization when combining with rescoring. The Y axis of the graph represents average recall over 50 queries, while the X axis represents num candidates. There are 4 lines on the graph, each representing a different type of queries. The line representing binary, in red, starts near 0,0 and stays below 0.6 on the graph across all num candidates, putting it as the lowest line on the graph. The float ANN line, in blue, starts near the top of the Y axis at 0 num candidates and moves in a level line across the graph, same goes for the scalar line, in orange, which comes in just below the float ANN. The binary + rescoring line starts towards the bottom of the Y axis at 0 num candidates, but gradually increases the more the graph moves right.

Darüber hinaus ermöglicht die Vektorquantisierung dank der geringeren Kosten fortschrittlichere Anwendungsfälle mit mehreren Vektoren, deren Implementierung zu rechenintensiv oder zu kostspielig gewesen wäre. Die Vektorquantisierung kann Benutzern beispielsweise bei Folgendem helfen:

Führen Sie beim Prototyping problemlos A/B-Tests verschiedener Einbettungsmodelle durch, indem Sie mehrere Vektoren verwenden, die aus demselben Quellfeld erstellt wurden. Das Dokumentmodell von MongoDB ermöglicht – gekoppelt mit quantisierten Vektoren – mehr Agilität bei geringeren Kosten. Das flexible Dokumentschema ermöglicht Entwicklern eine schnelle Bereitstellung und den Vergleich von Ergebnissen eingebetteter Modelle, ohne den Index neu erstellen oder ein völlig neues Datenmodell bzw. eine neue Infrastruktur bereitstellen zu müssen.
Verbessern Sie die Relevanz von Suchergebnissen oder Kontext für Large Language Models (LLMs) weiter, indem Sie Vektoren aus mehreren relevanten Quellen integrieren, z. B. verschiedene Quellfelder (Produktbeschreibungen, Produktbilder usw.), die in dasselbe oder in verschiedene Modelle eingebettet sind.

Erste Schritte und weitere Schritte

Dank der Unterstützung für die Aufnahme skalarer quantisierter Vektoren können Entwickler jetzt quantisierte Vektoren von den Einbettungsmodellanbietern ihrer Wahl (wie Cohere, Nomic, Jina, Mixedbread und anderen) importieren und damit arbeiten – direkt in Atlas Vector Search. Lesen Sie die Dokumentation und das Tutorial, um loszulegen.

Und in den kommenden Wochen werden zusätzliche Vektorquantisierungsfunktionen Entwicklern ein umfassendes Toolset für die Erstellung und Optimierung von Anwendungen mit quantisierten Vektoren an die Hand geben:

Durch die Unterstützung der Aufnahme binärer quantisierter Vektoren lässt sich der Speicherplatz weiter reduzieren, was zu größeren Kosteneinsparungen führt und Entwicklern die Flexibilität gibt, den Typ quantisierter Vektoren auszuwählen, der ihren Anforderungen am besten entspricht.

Automatische Quantisierung und Neubewertung bieten native Funktionen für skalare Quantisierung sowie binäre Quantisierung mit Neubewertung in Atlas Vector Search, was es Entwicklern erleichtert, die Vektorquantisierung innerhalb der Plattform voll auszunutzen.

Mit der Unterstützung für quantisierte Vektoren in MongoDB Atlas Vector Search können Sie skalierbare und leistungsstarke Semantische-Such- und Generative-KI-Anwendungen flexibel und kostengünstig erstellen. Schauen Sie sich diese Ressourcen an, um mit der Dokumentation und dem Tutorial zu beginnen.

Schauen Sie sich unsere Kurzanleitung an, um noch heute mit Atlas Vector Search zu beginnen.

← Previous

MongoDB.local London 2024: Better Applications, Faster

This post is also available in: Deutsch , Français , Español , Português , Italiano , 한국어 , 简体中文 . Since we kicked off MongoDB’s series of 2024 events in April, we’ve connected with thousands of customers, partners, and community members in cities around the world—from Mexico City to Mumbai. Yesterday marked the nineteenth stop of the 2024 MongoDB.local tour, and we had a blast welcoming folks across industries to MongoDB.local London, where we discussed the latest technology trends, celebrated customer innovations, and unveiled product updates that make it easier than ever for developers to build next-gen applications. Over the past year, MongoDB’s more than 50,000 customers have been telling us that their needs are changing. They’re increasingly focused on three areas: Helping developers build faster and more efficiently Empowering teams to create AI-powered applications Moving from legacy systems to modern platforms Across these areas, there’s a common need for a solid foundation: each requires a resilient, scalable, secure, and highly performant database. The updates we shared at MongoDB.local London reflect these priorities. MongoDB is committed to ensuring that our products are built to exceed our customers’ most stringent requirements, and that they provide the strongest possible foundation for building a wide range of applications, now and in the future. Indeed, during yesterday’s event, Sahir Azam, MongoDB’s Chief Product Officer, discussed the foundational role data plays in his keynote address. He also shared the latest advancement from our partner ecosystem, an AI solution powered by MongoDB, Amazon Web Services, and Anthropic that makes it easier for customers to deploy gen AI customer care applications. MongoDB 8.0: The best version of MongoDB ever The biggest news at .local London was the general availability of MongoDB 8.0 , which provides significant performance improvements, reduced scaling costs, and adds additional scalability, resilience, and data security capabilities to the world’s most popular document database. Architectural optimizations in MongoDB 8.0 have significantly reduced memory usage and query times, and MongoDB 8.0 has more efficient batch processing capabilities than previous versions. Specifically, MongoDB 8.0 features 36% better read throughput, 56% faster bulk writes, and 20% faster concurrent writes during data replication. In addition, MongoDB 8.0 can handle higher volumes of time series data and can perform complex aggregations more than 200% faster—with lower resource usage and costs. Last (but hardly least!), Queryable Encryption now supports range queries, ensuring data security while enabling powerful analytics. For more on MongoDB.local London’s product announcements—which are designed to accelerate application development, simplify AI innovation, and speed developer upskilling—please read on! Accelerating application development Improved scaling and elasticity on MongoDB Atlas capabilities New enhancements to MongoDB Atlas’s control plane allow customers to scale clusters faster, respond to resource demands in real-time, and optimize performance—all while reducing operational costs. First, our new granular resource provisioning and scaling features—including independent shard scaling and extended storage and IOPS on Azure—allow customers to optimize resources precisely where needed. Second, Atlas customers will experience faster cluster scaling with up to 50% quicker scaling times by scaling clusters in parallel by node type. Finally, MongoDB Atlas users will enjoy more responsive auto-scaling, with a 5X improvement in responsiveness thanks to enhancements in our scaling algorithms and infrastructure. These enhancements are being rolled out to all Atlas customers, who should start seeing benefits immediately. IntelliJ plugin for MongoDB Announced in private preview, the MongoDB for IntelliJ Plugin is designed to functionally enhance the way developers work with MongoDB in IntelliJ IDEA, one of the most popular IDEs among Java developers. The plugin allows enterprise Java developers to write and test Java queries faster, receive proactive performance insights, and reduce runtime errors right in their IDE. By enhancing the database-to-IDE integration, JetBrains and MongoDB have partnered to deliver a seamless experience for their shared user-base and unlock their potential to build modern applications faster. Sign up for the private preview here . MongoDB Copilot Participant for VS Code (Public Preview) Now in public preview, the new MongoDB Participant for GitHub Copilot integrates domain-specific AI capabilities directly with a chat-like experience in the MongoDB Extension for VS Code . The participant is deeply integrated with the MongoDB extension, allowing for the generation of accurate MongoDB queries (and exporting them to application code), describing collection schemas, and answering questions with up-to-date access to MongoDB documentation without requiring the developer to leave their coding environment. These capabilities significantly reduce the need for context switching between domains, enabling developers to stay in their flow and focus on building innovative applications. Multicluster support for the MongoDB Enterprise Kubernetes Operator Ensure high availability, resilience, and scale for MongoDB deployments running in Kubernetes through added support for deploying MongoDB and Ops Manager across multiple Kubernetes clusters. Users now have the ability to deploy ReplicaSets, Sharded Clusters (in public preview), and Ops Manager across local or geographically distributed Kubernetes clusters for greater deployment resilience, flexibility, and disaster recovery. This approach enables multi-site availability, resilience, and scalability within Kubernetes, capabilities that were previously only available outside of Kubernetes for MongoDB. To learn more, check out the documentation . MongoDB Atlas Search and Vector Search are now generally available via the Atlas CLI and Docker The local development experience for MongoDB Atlas is now generally available. Use the MongoDB Atlas CLI and Docker to build with MongoDB Atlas in your preferred local environment, and easily access features like Atlas Search and Atlas Vector Search throughout the entire software development lifecycle. The Atlas CLI provides a unified and familiar terminal-based interface that allows you to deploy and build with MongoDB Atlas in your preferred development environment, locally or in the cloud. If you build with Docker, you can also now use Docker and Docker Compose to easily integrate Atlas in your local and continuous integration environments with the Atlas CLI . Avoid repetitive work by automating the lifecycle of your development and testing environments and focus on building application features with full-text search, AI and semantic search, and more. Simplifying AI innovation Reduce costs and increase scale in Atlas Vector Search We announced vector quantization capabilities in Atlas Vector Search . By reducing memory (by up to 96%) and making vectors faster to retrieve, vector quantization allows customers to build a wide range of AI and search applications at higher scale and lower cost. Generally available now, support for scalar quantized vector ingestion lets customers seamlessly import and work with quantized vectors from their embedding model providers of choice—directly in Atlas Vector Search. Coming soon, additional vector quantization features, including automatic quantization, will equip customers with a comprehensive toolset for building and optimizing large-scale AI and search applications in Atlas Vector Search. Additional integrations with popular AI frameworks Ship your next AI-powered project faster with MongoDB, no matter your framework or LLM of choice. AI technologies are advancing rapidly, making it important to build and scale performant applications quickly, and to use your preferred stack as your requirements and available technologies evolve. MongoDB’s enhanced suite of integrations with LangChain, LlamaIndex, Microsoft Semantic Kernel, AutoGen, Haystack, Spring AI, the ChatGPT Retrieval Plugin, and more make it easier than ever to build the next generation of applications on MongoDB . Advancing developer upskilling New MongoDB Learning Badges Faster to achieve and more targeted than a certification, MongoDB's free Learning Badges show your commitment to continuous learning and to proving your knowledge about a specific topic. Follow the learning path, gain new skills, and get a digital badge to show off on LinkedIn. Check out the two new gen AI learning badges! Building gen AI Apps : Learn to create innovative gen AI apps with Atlas Vector Search, including retrieval-augmented generation (RAG) apps. Deploying and Evaluating gen AI Apps : Take your apps from creation to full deployment, focusing on optimizing performance and evaluating results. Learn more To learn more about MongoDB’s recent product announcements and updates, check out our What’s New product announcements page and all of our blog posts about product updates . Happy building!

October 3, 2024

Next →

Advancing Encryption in MongoDB Atlas

Maintaining a strong security posture and ensuring compliance with regulations and industry standards are core responsibilities of enterprise security teams. However, satisfying these responsibilities is becoming increasingly complex, time-consuming, and high-stakes. The rapid evolution of the threat landscape is a key driver of this challenge. In 2024, the percentage of organizations that experienced a data breach costing $1 million or more jumped from 27% to 36%. 1 This was partly fueled by a 180% surge from 2023 to 2024 in vulnerability exploitation by attackers. 2 Concurrently, regulations are tightening. Laws like the Health Insurance Portability and Accountability Act (HIPAA) 3 and the U.S. Securities and Exchange Commission’s cybersecurity regulations 4 have introduced stricter security requirements. This has raised the bar for compliance. Thousands of enterprises rely on MongoDB Atlas to protect their sensitive data and support compliance efforts. Encryption plays a crucial role on three levels; securing data at rest, in transit, and in use. However, security teams need more than solely strong encryption. Flexibility and control are essential to align with an organization’s specific requirements. MongoDB is introducing significant upgrades to MongoDB Atlas encryption to meet these needs. This includes enhanced customer-managed key (CMK) functionality and support for TLS 1.3. This post explores these improvements, along with the planned deprecation of outdated TLS versions, to strengthen organizations’ security postures. Why customer-managed keys (CMKs) matter Customer-managed keys (CMKs) are a security and data governance feature that delivers enterprises full control over the encryption keys protecting their data. With CMKs, customers can define and manage their encryption strategy. This ensures they have ultimate authority over access to their sensitive information. MongoDB Atlas customer key management provides file-level encryption, similar to transparent data encryption (TDE) in other databases. This customer-managed encryption-at-rest feature works alongside always-on volume-level encryption 5 in MongoDB Atlas. CMKs ensure all database files and backups are encrypted. MongoDB Atlas also integrates with AWS Key Management Service (AWS KMS), Azure Key Vault , and Google Cloud KMS . This ensures customers have the flexibility to manage keys as part of their broader enterprise security strategy. Customers using CMKs retain complete control of their encryption keys. If an organization needs to revoke access to data due to a security concern or any other reason, it can do so immediately by freezing or destroying the encryption keys. This capability acts as a “kill switch,” ensuring sensitive information becomes inaccessible when protection is critical. Similarly, an organization can destroy the keys to render the data and backups permanently unreadable and irretrievable. This may be applicable should they choose to retire a cluster permanently. Announcing CMK over private networking As part of a commitment to deliver secure and flexible solutions for enterprise customers, MongoDB is introducing CMKs over private networking. This enhancement enables organizations to manage their encryption keys without exposing their key management service (KMS) to the public internet. Using CMKs in MongoDB Atlas previously required Azure Key Vault and AWS KMS to be accessible via public IP addresses prior to today. While functional, this posed challenges for customers who need to keep KMS traffic private. It forced those customers to either expose their KMS endpoints or manage IP allow lists. By using private networking, customers can now: Eliminate the need for public IP exposure. Simplify network management by removing the need to manage allowed IP addresses. This reduces administrative effort and misconfiguration risk. Align with organizational requirements that mandate the use of private networking. Customer key management over private networking is now available for Azure Key Vault and AWS KMS . Customers can enable and manage this feature for all their MongoDB Atlas projects through the MongoDB Atlas UI or the MongoDB Atlas Administration API . More enhancements are coming for MongoDB customer key management in 2025. These include secretless authentication mechanisms and CMKs for search nodes. MongoDB Atlas TLS enhancements advance encryption in transit Securing data in transit is equally vital as a foundation of encryption at rest with CMKs. To address this, MongoDB Atlas enforces TLS by default. This ensures encrypted communication across all aspects of the platform, including client connections. Now MongoDB is reinforcing its TLS implementation with key enhancements for enterprise-grade security. MongoDB is in the process of rolling out fleetwide support for TLS 1.3 in MongoDB Atlas. The latest version of the cryptographic protocol offers several advantages over its predecessors. This includes stronger security defaults, faster handshakes, and reduced latency. Concurrently, TLS versions 1.0 and 1.1 are being deprecated. The rationale for this is known weaknesses and their inability to meet modern security standards. MongoDB is aligning with industry best practices by standardizing on TLS 1.2 and 1.3. This ensures a secure communication environment for all MongoDB Atlas users. Additionally, MongoDB now offers custom cipher suite selection, giving enterprises more control over their cryptographic configurations. This feature lets organizations choose the cipher suites for their TLS connections, ensuring compliance with their security requirements. Achieving encryption everywhere This post covers how MongoDB secures data at rest with CMKs and in transit with TLS. However, what about data in use while it’s being processed in a MongoDB Atlas instance? That’s where Queryable Encryption comes in. This groundbreaking feature enables customers to run expressive queries on encrypted data without ever exposing the plaintext or keys outside the client application. Sensitive data and queries never leave the client unencrypted. This ensures sensitive information is protected and inaccessible to anyone without the keys, including database administrators and MongoDB itself. MongoDB is committed to providing enterprise-grade security that evolves with the changing threat and regulatory landscapes. Organizations now have greater control, flexibility, and protection across every stage of the data lifecycle with enhanced CMK functionality, TLS 1.3 adoption, and custom cipher suite selection. As security challenges grow more complex, MongoDB continues to innovate to enable enterprises to safeguard their most sensitive data. To learn more about these encryption enhancements and how they can strengthen your security posture, visit MongoDB Data Encryption . 1 PwC , October 2024 2 Verizon Data Breach Investigations Report , 2024 3 U.S. Department of Health and Human Services , December 2024 4 U.S. Securities and Exchange Commission , 2023 5 MongoDB Atlas Security White Paper , Encryption at Rest section page 12

March 5, 2025