Binary Quantization & Rescoring: 96% Less Memory, Faster Search

Mai Nguyen and Henry Weller
December 12, 2024 | Updated: March 5, 2025
#Vector Search

We are excited to share that several new vector quantization capabilities are now available in public preview in MongoDB Atlas Vector Search: support for binary quantized vector ingestion, automatic scalar quantization, and automatic binary quantization and rescoring.

Together with our recently released support for scalar quantized vector ingestion, these capabilities will empower developers to scale semantic search and generative AI applications more cost-effectively. For a primer on vector quantization, check out our previous blog post.

Enhanced developer experience with native quantization in Atlas Vector Search

Effective quantization methods—specifically scalar and binary quantization—can now be done automatically in Atlas Vector Search. This makes it easier and more cost-effective for developers to use Atlas Vector Search to unlock a wide range of applications, particularly those requiring over a million vectors.

With the new “quantization” index definition parameters, developers can choose to use full-fidelity vectors by specifying “none,” or they can quantize vector embeddings by specifying the desired quantization type—”scalar” or “binary” (Figure 1). This native quantization capability supports vector embeddings from any model provider as well as MongoDB’s BinData float32 vector subtype.

Figure 1: New index definition parameters for specifying automatic quantization type in
Atlas Vector Search

Screenshot of example index definition parameters

Scalar quantization—converting a float point into an integer—is generally used when it's crucial to maintain search accuracy on par with full-precision vectors. Meanwhile, binary quantization—converting a float point into a single bit of 0 or 1—is more suitable for scenarios where storage and memory efficiency are paramount, and a slight reduction in search accuracy is acceptable. If you’re interested in learning more about this process, check out our documentation.

Binary quantization with rescoring: Balance cost and accuracy

Compared to scalar quantization, binary quantization further reduces memory usage, leading to lower costs and improved scalability—but also a decline in search accuracy. To mitigate this, when “binary” is chosen in the “quantization” index parameter, Atlas Vector Search incorporates an automatic rescoring step, which involves re-ranking a subset of the top binary vector search results using their full-precision counterparts, ensuring that the final search results are highly accurate despite the initial vector compression.

Empirical evidence demonstrates that incorporating a rescoring step when working with binary quantized vectors can dramatically enhance search accuracy, as shown in Figure 2 below.

Figure 2: Combining binary quantization and rescoring helps retain search accuracy by up to 95%

Chart showing that Scalar has the highest average recall over 50 queries and num candidates compared to float ANN, binary + rescoring, and binary.

And as Figure 3 shows, in our tests, binary quantization reduced processing memory requirement by 96% while retaining up to 95% search accuracy and improving query performance.

Figure 3: Improvements in Atlas Vector Search with the use of vector quantization

A chart showing the percentage improvements to Atlas Vector Search with the use of vector quantization.

It’s worth noting that even though the quantized vectors are used for indexing and search, their full-fidelity vectors are still stored on disk to support rescoring. Furthermore, retaining the full-fidelity vectors enables developers to perform exact vector search for experimental, high-precision use cases, such as evaluating the search accuracy of quantized vectors produced by different embedding model providers, as needed. For more on evaluating the accuracy of quantized vectors, please see our documentation.

So how can developers make the most of vector quantization? Here are some example use cases that can be made more efficient and scaled effectively with quantized vectors:

Massive knowledge bases can be used efficiently and cost-effectively for analysis and insight-oriented use cases, such as content summarization and sentiment analysis. Unstructured data like customer reviews, articles, audio, and videos can be processed and analyzed at a much larger scale, at a lower cost and faster speed.
Using quantized vectors can enhance the performance of retrieval-augmented generation (RAG) applications. The efficient processing can support query performance from large knowledge bases, and the cost-effectiveness advantage can enable a more scalable, robust RAG system, which can result in better customer and employee experience.
Developers can easily A/B test different embedding models using multiple vectors produced from the same source field during prototyping. MongoDB’s flexible document model lets developers quickly deploy and compare embedding models’ results without the need to rebuild the index or provision an entirely new data model or set of infrastructure.
The relevance of search results or context for large language models (LLMs) can be improved by incorporating larger volumes of vectors from multiple sources of relevance, such as different source fields (product descriptions, product images, etc.) embedded within the same or different models.

To get started with vector quantization in Atlas Vector Search, see the following developer resources:

Documentation: Vector Quantization in Atlas Vector Search
Documentation: How to Measure the Accuracy of Your Query Results
Tutorial: How to Use Cohere's Quantized Vectors to Build Cost-effective AI Apps With MongoDB

← Previous

IntellectAI Unleashes AI at Scale With MongoDB

IntellectAI , a business unit of Intellect Design Arena , is a trailblazer in AI. Since 2019 the company has been using MongoDB to drive a number of innovative use cases in the banking, financial services, and insurance (BFSI) industry. For example, Intellect Design Arena’s broader insurance business has been using MongoDB Atlas as a foundation for its architecture. Atlas’s flexibility enables Intellect Design Arena to manage varied and constantly evolving datasets and increase operational performance. Check out our AI Learning Hub to learn more about building AI-powered apps with MongoDB. Building on this experience, the company looked at deepening its use of MongoDB Atlas’s unique AI and search capabilities for its new IntellectAI division. IntellectAI Partner and Chief Technology Officer Deepak Dastrala spoke on the MongoDB.local Mumbai stage in September 2024 . Dastrala shared how the company has built a powerful, scalable, and highly accurate AI platform-as-a-service offering, Purple Fabric , using MongoDB Atlas and Atlas Vector Search . Using AI to generate actionable compliance insights for clients Purple Fabric helps transform enterprise data into actionable AI insights and solutions by making data ready for retrieval-augmented generation (RAG). The platform collects and analyzes structured and unstructured enterprise data, policies, market data, regulatory information, and tacit knowledge to enable its AI Expert Agent System to achieve precise, goal-driven outcomes with accuracy and speed. A significant part of IntellectAI’s work involves assessing environmental, social, and governance (ESG) compliance. This requires companies to monitor diverse nonfinancial factors such as child labor practices, supply chain ethics, and biodiversity. “Historically, 80% to 85% of AI projects fail because people are still worried about the quality of the data. With Generative AI, which is often unstructured, this concern becomes even more significant,” said Deepak Dastrala. According to Deepak Dastrala, the challenge today is less about building AI tools than about operationalizing AI effectively. A prime example of this is IntellectAI’s work with one of the largest sovereign wealth funds in the world, which manages over $1.5 trillion across 9,000 companies. The fund sought to utilize AI for making responsible investment decisions based on millions of unique data points across those companies, including compliance, risk prediction, and impact assessment. This included processing both structured and unstructured data to enable the fund to make informed, real-time decisions. “We had to process almost 10 million documents in more than 30 different data formats—text and image—and correlate both structured and unstructured data to provide those particular hard-to-find insights,” said Dastrala. “We ingested hundreds of millions of vectors across these documents, and this is where we truly understood the power of MongoDB.” For example, by leveraging MongoDB's capabilities, including time series collections, IntellectAI simplifies the processing of unstructured and semi-structured data from companies' reports over various years, extracting key performance metrics and trends to enhance compliance insights. “MongoDB Atlas and Vector Search give us flexibility around the schema and how we can turn particular data into knowledge,” Dastrala said. For Dastrala, there are four unique advantages of working with MongoDB—particularly using MongoDB Atlas Vector Search—that other companies should consider when building long-term AI strategies: a unified data model, multimodality, dynamic data linking, and simplicity. “For me, the unified data model is a really big thing because a stand-alone vector database will not help you. The kind of data that you will continue to ingest will increase, and there are no limits. So whatever choices that you make, you need to make the choices from the long-term perspective,” said Dastrala. Delivering massive scale, driving more than 90% AI accuracy, and accelerating decision-making with MongoDB Before IntellectAI built this ESG capability, its client relied on subject matter experts, but they could examine only a limited number of companies and datasets and were unable to scale their investigation of portfolios or information. “If you want to do it at scale, you need proper enterprise support, and that’s where MongoDB became really handy for us. We are able to give 100% coverage and do what the ESG analysts were able to do for this organization almost a thousand times faster,” said Dastrala. Previously, analysts could examine only between 100 and 150 companies. With MongoDB Atlas and Atlas Vector Search, Purple Fabric can now process information from over 8,000 companies across the world, covering different languages and delivering more than 90% accuracy. “Generally, RAG will probably give you 80% to 85% accuracy. But in our case, we are talking about a fund deciding whether to invest billions or not in a company, so the accuracy should be 90% minimum,” said Dastrala. “What we are doing is not ‘simple search’; it is very contextual, and MongoDB helps us provide that high-dimension data.” Concluding the presentation speech on the MongoDB.local stage, Dastrala reminded the audience why IntellectAI is using MongoDB’s unique capabilities to support its long-term vision: “Multimodality is very important because today we are using text and images, but tomorrow we might use audio, video, and more. And don’t forget, from a developer perspective, how important it is to keep the simplicity and leverage all the options that MongoDB provides.” This is just the beginning for IntellectAI and its Purple Fabric platform. “Because we are doing more and more with greater accuracy, our customers have started giving us more problems to solve. And this is absolutely happening at a scale [that] is unprecedented,” said Dastrala. Using MongoDB Atlas to drive broader business benefits across Intellect Design The success encountered with the Purple Fabric platform is leading Intellect Design’s broader business to look at MongoDB Atlas for more use cases. Intellect Design is currently in the process of migrating more of its insurance and Wealth platforms onto MongoDB Atlas, as well as leveraging the product family to support the next phase of its app modernization strategy. Using MongoDB Atlas, Intellect Design aims to improve resilience, support scalable growth, decrease time to market, and enhance data insights. Head over to our product page to learn more about MongoDB Atlas . To learn more about how MongoDB Atlas Vector Search can help you build or deepen your AI and search capabilities, visit our Vector Search page . Want to learn more about why MongoDB is the best choice for supporting modern AI applications? Check out our on-demand webinar, “ Comparing PostgreSQL vs. MongoDB: Which is Better for AI Workloads ? ” presented by MongoDB Field CTO, Rick Houlihan.

December 12, 2024

Next →

MongoDB Powers M-DAQ’s Anti-Money Laundering Compliance Platform

Founded and headquartered in Singapore, M-DAQ Global is a fintech powerhouse providing seamless cross-border transactions for businesses worldwide. M-DAQ’s comprehensive suite of foreign exchange, collections, and payments solutions help organizations of all sizes navigate the complexities of global trade, offering FX clarity, certainty, and payment mobility. M-DAQ also offers AI-powered services like Know Your Business (KYB), onboarding, and advanced risk management tools. Amidst ever-evolving requirements, these enable business transactions across borders with ease, while staying compliant. One of M-DAQ's most innovative solutions, CheckGPT , is an AI-powered platform designed to streamline Anti-Money Laundering (AML) compliance. It was built on MongoDB Atlas , providing a strong foundation for designing multitenant data storage. This approach ensures that each client has a dedicated database, effectively preventing any data co-mingling. Traditional AML processes often involve tedious, time-consuming tasks, from document review, to background checks, to customer onboarding. By building CheckGPT, M-DAQ’s aim was to change this paradigm, and to leverage AI to automate (and speed) these manual processes. Today, CheckGPT allows businesses to process onboarding 30 times faster than traditional human processing. The platform also leverages MongoDB Atlas’s native Vector Search capabilities to power intelligent semantic searches across unstructured data. The challenge: Managing unstructured, sensitive data, and performing complex searches One of CheckGPT’s priorities was to improve processes around collecting, summarizing, and analyzing data, while flagging potential risks to customers quickly and accurately. Considering the vast number and complexity of data sets its AI platform had to handle, and the strict regulatory landscape the company operates in, it was crucial that M-DAQ chose a robust database. CheckGPT needed a database that could efficiently and accurately handle unstructured data, and adapt rapidly as the data evolved. The database also had to be highly secure; to function, the AI tool would have to handle highly sensitive data, and would need to be used by companies operating in highly regulated industries. Finally, CheckGPT was looking for the ability to perform complex, high-dimensional searches to power a wide range of complex searches and real-time information analysis. MongoDB Atlas: A complete platform with unique features According to M-DAQ, there are many benefits of using MongoDB Atlas’ document model: Flexibility: MongoDB Atlas’s document model accommodates the evolving nature of compliance data, providing the flexibility needed to manage CheckGPT's dynamic data structures, such as onboarding documents and compliance workflows. Security and performance: The MongoDB Atlas platform also ensures that data remains secure throughout its lifecycle. M-DAQ was able to implement a multi-tenancy architecture that securely isolates data across its diverse client base. This ensures that the platform can handle varying compliance demands while maintaining exceptional performance, giving M-DAQ’s customers the confidence that the AML processes handled by CheckGPT are compliant with stringent regulatory standards. Vector search capabilities: MongoDB Atlas provides a unified development experience. Particularly, MongoDB Atlas Vector Search enables real-time searches across a vast amount of high-dimensional datasets. This makes it easier to verify documents, conduct background checks, and continuously monitor customer activity, ensuring fast and accurate results during AML processes. “AI, together with the flexibility of MongoDB, has greatly impacted CheckGPT, enabling us to scale operations and automate complex AML compliance processes,” said Andrew Marchen, General Manager, Payments and Co-founder, Wallex at M-DAQ Global. “This integration significantly reduces onboarding time, which typically took between 4-8 hours to three days depending on the document’s complexity, to less than 10 minutes. With MongoDB, M-DAQ is able to deliver faster and more accurate results while meeting customer needs in a secure and adaptable environment." The future of CheckGPT, powered by MongoDB M-DAQ believes that AI and data-driven technologies and tools will continue to play a central role in automating complex processes. By employing AI, M-DAQ aims to improve operational efficiency, enhance customer experiences, and scale rapidly—while maintaining high service standards. MongoDB’s flexibility and multi-cloud support will be key as M-DAQ plans to use single/multi-cluster and multi-region capabilities in the future. M-DAQ aims to explore additional features that could enhance CheckGPT's scalability and performance. The company, for example, plans to expand its use of MongoDB for future projects involving automating complex processes like compliance, onboarding, and risk management in 2025. Learn more about CheckGPT on their site . Visit our product page to learn more about MongoDB Atlas. Get started with MongoDB Atlas Vector Search today with our Atlas Vector Search Quick Start guide .

April 1, 2025