MongoDB Blog
Announcements, updates, news, and more
Redefining the Database for AI: Why MongoDB Acquired Voyage AI
February 24, 2025
News
MongoDB Powers M-DAQ’s Anti-Money Laundering Compliance Platform
Founded and headquartered in Singapore, M-DAQ Global is a fintech powerhouse providing seamless cross-border transactions for businesses worldwide. M-DAQ’s comprehensive suite of foreign exchange, collections, and payments solutions help organizations of all sizes navigate the complexities of global trade, offering FX clarity, certainty, and payment mobility. M-DAQ also offers AI-powered services like Know Your Business (KYB), onboarding, and advanced risk management tools. Amidst ever-evolving requirements, these enable business transactions across borders with ease, while staying compliant. One of M-DAQ's most innovative solutions, CheckGPT , is an AI-powered platform designed to streamline Anti-Money Laundering (AML) compliance. It was built on MongoDB Atlas , providing a strong foundation for designing multitenant data storage. This approach ensures that each client has a dedicated database, effectively preventing any data co-mingling. Traditional AML processes often involve tedious, time-consuming tasks, from document review, to background checks, to customer onboarding. By building CheckGPT, M-DAQ’s aim was to change this paradigm, and to leverage AI to automate (and speed) these manual processes. Today, CheckGPT allows businesses to process onboarding 30 times faster than traditional human processing. The platform also leverages MongoDB Atlas’s native Vector Search capabilities to power intelligent semantic searches across unstructured data. The challenge: Managing unstructured, sensitive data, and performing complex searches One of CheckGPT’s priorities was to improve processes around collecting, summarizing, and analyzing data, while flagging potential risks to customers quickly and accurately. Considering the vast number and complexity of data sets its AI platform had to handle, and the strict regulatory landscape the company operates in, it was crucial that M-DAQ chose a robust database. CheckGPT needed a database that could efficiently and accurately handle unstructured data, and adapt rapidly as the data evolved. The database also had to be highly secure; to function, the AI tool would have to handle highly sensitive data, and would need to be used by companies operating in highly regulated industries. Finally, CheckGPT was looking for the ability to perform complex, high-dimensional searches to power a wide range of complex searches and real-time information analysis. MongoDB Atlas: A complete platform with unique features According to M-DAQ, there are many benefits of using MongoDB Atlas’ document model: Flexibility: MongoDB Atlas’s document model accommodates the evolving nature of compliance data, providing the flexibility needed to manage CheckGPT's dynamic data structures, such as onboarding documents and compliance workflows. Security and performance: The MongoDB Atlas platform also ensures that data remains secure throughout its lifecycle. M-DAQ was able to implement a multi-tenancy architecture that securely isolates data across its diverse client base. This ensures that the platform can handle varying compliance demands while maintaining exceptional performance, giving M-DAQ’s customers the confidence that the AML processes handled by CheckGPT are compliant with stringent regulatory standards. Vector search capabilities: MongoDB Atlas provides a unified development experience. Particularly, MongoDB Atlas Vector Search enables real-time searches across a vast amount of high-dimensional datasets. This makes it easier to verify documents, conduct background checks, and continuously monitor customer activity, ensuring fast and accurate results during AML processes. “AI, together with the flexibility of MongoDB, has greatly impacted CheckGPT, enabling us to scale operations and automate complex AML compliance processes,” said Andrew Marchen, General Manager, Payments and Co-founder, Wallex at M-DAQ Global. “This integration significantly reduces onboarding time, which typically took between 4-8 hours to three days depending on the document’s complexity, to less than 10 minutes. With MongoDB, M-DAQ is able to deliver faster and more accurate results while meeting customer needs in a secure and adaptable environment." The future of CheckGPT, powered by MongoDB M-DAQ believes that AI and data-driven technologies and tools will continue to play a central role in automating complex processes. By employing AI, M-DAQ aims to improve operational efficiency, enhance customer experiences, and scale rapidly—while maintaining high service standards. MongoDB’s flexibility and multi-cloud support will be key as M-DAQ plans to use single/multi-cluster and multi-region capabilities in the future. M-DAQ aims to explore additional features that could enhance CheckGPT's scalability and performance. The company, for example, plans to expand its use of MongoDB for future projects involving automating complex processes like compliance, onboarding, and risk management in 2025. Learn more about CheckGPT on their site . Visit our product page to learn more about MongoDB Atlas. Get started with MongoDB Atlas Vector Search today with our Atlas Vector Search Quick Start guide .
LangChainGo and MongoDB: Powering RAG Applications in Go
MongoDB is excited to announce our integration with LangChainGo, making it easier to build Go applications powered by large language models (LLMs). This integration streamlines LLM-based application development by leveraging LangChainGo’s abstractions to simplify LLM orchestration, MongoDB’s vector database capabilities, and Go’s strengths as a performant, scalable, and easy-to-use production-ready language. With robust support for retrieval-augmented generation (RAG) and AI agents, MongoDB enables efficient knowledge retrieval, contextual understanding, and real-time AI-driven workflows. Read on to learn more about this integration and the advantages of using MongoDB as a vector database for AI/ML applications in Go. LangChainGo: Bringing LangChain to the Go ecosystem LangChain is an open-source framework that simplifies building LLM-powered applications. It offers tools and abstractions to integrate LLMs with diverse data sources, APIs, and workflows, supporting use cases like chatbots, document processing, and autonomous agents. While LangChain currently supports only Python and JavaScript, the need for a similar solution in the Go ecosystem led to the development of LangChainGo. LangChainGo is a community-driven, third-party port of the LangChain framework for the Go programming language. It allows Go developers to directly integrate LLMs into their Go applications, bringing the capabilities of the original LangChain framework into the Go ecosystem. LangChainGo enables users to embed data using various services, including OpenAI, Ollama, Mistral, and others. It also supports integration with a variety of vector stores, such as MongoDB. MongoDB’s role as an operational and vector database MongoDB excels as a unified data layer for AI applications with native vector search capabilities due to its simplicity, scalability, security, and rich set of features. With Atlas Vector Search built into the core database, there's no need to sync operational and vector data separately—everything stays in one place, saving time and reducing complexity when you develop AI-powered applications. You can easily combine semantic searches with metadata filters, graph lookups, aggregation pipelines, and even geo-spatial or lexical search, enabling powerful hybrid queries all within a single platform. MongoDB’s distributed architecture allows the usage of vector search to scale independently from the core database, ensuring optimized vector query performance and workload isolation for superior scalability. Plus, with enterprise-grade security and high availability, MongoDB provides the reliability and peace of mind you need to power your AI-driven applications at scale. MongoDB, Go, and AI/ML As the Go AI/ML landscape grows, MongoDB continues to drive innovation with its powerful vector search capabilities and LangChainGo integration, empowering developers to build RAG implementations and AI agents. This integration is powered by the MongoDB Go Driver , which supports vector search and allows developers to interact with MongoDB directly from their Go applications, streamlining development and reducing friction. Figure 1. RAG architecture with MongoDB and LangChainGo. While Python and JavaScript dominate the AI/ML ecosystem, Go’s AI/ML ecosystem is still emerging—yet its potential is undeniable. Go’s simplicity, scalability, runtime safety, concurrency, and single-binary deployment make it an ideal production-ready language for AI. With MongoDB’s powerful database and helpful learning resources, developers can seamlessly build next-generation AI solutions in Go. Ready to dive in? Explore the tutorials below to get started! Getting Started with MongoDB and LangChainGo MongoDB was added as a vector store in LangChainGo’s v0.1.13 release. It is packaged as mongovector , a component that enables developers to use MongoDB as a powerful vector store in LangChainGo. Usage guidance is provided through the mongovector-vectorstore-example , along with the in-depth tutorials linked below. Dive into this integration to unlock the full potential of Go AI applications with MongoDB. We’re excited for you to work with LangChainGo. Here are some tutorials to help you get started: Get Started with the LangChainGo Integration Retrieval-Augmented Generation (RAG) with Atlas Vector Search Build a Local RAG Implementation with Atlas Vector Search Get started with Atlas Vector Search (select Go from the dropdown menu)
Announcing the 2025 MongoDB PhD Fellowship Recipients
At MongoDB, we’re committed to fostering collaboration between academia and industry to support emerging research leaders. Now in its second year, the aim of the MongoDB PhD Fellowship Program is to advance cutting-edge research in computer science. Fellows receive financial support, mentorship, and opportunities to engage with MongoDB’s researchers and engineers throughout the year-long fellowship. They are also invited to present their research at MongoDB events. It’s hardly groundbreaking—but nonetheless true—to say that the world runs on software. As a result, investing in the future of software development is of paramount importance. So MongoDB is excited and honored to help these students push the frontiers of knowledge in their fields, and to contribute to innovations that will redefine the future of technology. Celebrating the 2025 MongoDB PhD Fellows This year, the selection process was extremely competitive, and the quality of the applications was excellent. The review panel of MongoDB researchers and engineers was impressed with the applicants' accomplishments to date, as well as with their ambitious goals for future research. Without further ado, I’m delighted to announce the recipients of the 2025 MongoDB PhD Fellowship. Congratulations to Xingjian Bai , William Zhang , and Renfei Zhou ! These three exceptional scholars stood out for their innovative research and potential to drive significant advancements in their field. Xingjian Bai , PhD candidate at MIT Xingjian Bai is a first-year PhD student in Electrical Engineering and Computer Science at MIT, supervised by Associate Professor Kaiming He. He obtained his master's and bachelor's degrees in Mathematics and Computer Science from the University of Oxford. His research lies at the intersection of classic algorithms and deep learning, with a focus on physics-inspired generative models and learning-augmented algorithms. More broadly, he is driven by research directions that are scientifically impactful or intellectually stimulating. In his spare time, he enjoys playing tennis and jogging. “I sincerely appreciate MongoDB’s support for Xingjian and contributions to fundamental research on artificial intelligence, deep learning, and machine learning.” - Kaiming He, Associate Professor of the Department of Electrical Engineering and Computer Science (EECS) at MIT William Zhang , PhD candidate at Carnegie Mellon University William Zhang is a third-year PhD student in the Computer Science Department, School of Computer Science, at Carnegie Mellon University. His research interest focuses on "self-driving" database management systems (DBMSs), specifically focusing on machine-learning-based techniques for optimizing their performance. He is advised by Associate Professor Andy Pavlo and is a member of the Database Group (CMU-DB) and Parallel Data Lab. "Will Zhang's PhD research at Carnegie Mellon University seeks to solve the problem all developers have struggled with since the 1970s: how to automate tuning and optimizing a database. Will is using an AI-based approach to develop database optimization algorithms that automatically learn how to exploit similarities between tuning options to reduce the complexity of database optimization. If successful his research will make it easier for anyone to deploy a database and maintain it as it grows over its lifetime. Removing the human burden of maintaining a database is especially important in the modern era of data-intensive AI applications. The Carnegie Mellon Database Group is grateful for MongoDB's support for Will's research through their PhD Fellowship program. Working with his mentor at MongoDB as part of the program provides Will with invaluable guidance and insight into the challenges developers face with databases, especially in a cloud setting like MongoDB Atlas." - Andy Pavlo, Associate Professor of Computer Science at CMU Renfei Zhou , PhD candidate at Carnegie Mellon University Renfei Zhou is a first-year PhD student studying theoretical computer science at CMU, co-advised by Assistant Professor William Kuszmaul and U.A. and Helen Whitaker Professor Guy Blelloch. He completed his bachelor’s degree in the Yao Class at Tsinghua University. He mainly works on classical data structures, especially hash tables and succinct data structures. He is also known for his work on fast matrix multiplication. "Renfei's research focuses on answering basic questions about how space- and time-efficient data structures can be. This is a research area that has a lot of potential for impact—both on how we, as theoreticians, think about data structures, but also on how data structures are implemented in the real world. Renfei isn't just a great researcher, he's also a great collaborator, and his research will almost certainly benefit from the mentorship that he will receive from researchers and engineers at MongoDB." - William Kuszmaul, Assistant Professor of Computer Science at CMU Seny Kamara, Head of Research at MongoDB, shared his thoughts on the program’s second year: “The applications we received for the fellowship were outstanding, but Renfei's, Will's and Xingjian’s research stood out for their depth and ambition. Their work tackles important problems in computer science and has the potential to impact both the wider industry as well as MongoDB’s efforts. We are very excited to collaborate with these exceptional students and to support their research.” We proudly congratulate this year’s winners and thank everyone who took the time to apply! The nomination window for the 2026 MongoDB PhD Fellowship Program will open on September 2, and we invite all PhD students with innovative ideas to apply. For more information about the MongoDB PhD Fellowship Program, the application process, and deadlines for next year's fellowships, please visit our PhD Fellowship Program page . Join a global community of educators and students, and access a wealth of resources, including free curriculum, specialized training, and certification pathways designed to enhance your teaching and student outcomes.
Secure and Scale Data with MongoDB Atlas on Azure and Google Cloud
MongoDB is committed to simplifying the development of robust, data-driven applications—regardless of where the data resides. Today, we’re announcing two major updates that enhance the security, scalability, and flexibility of MongoDB Atlas across cloud providers. Private, secure connectivity with Azure Private Link for MongoDB Atlas Data Federation, Atlas Online Archive, and Atlas SQL Developers building on Microsoft Azure can now establish private, secure connections to MongoDB Atlas Data Federation , MongoDB Atlas Online Archive , and MongoDB Atlas SQL using Azure Private Link, enabling: End-to-end security: Reduce exposure to security risks by keeping sensitive data off the public internet. Low-latency performance: Ensure faster and more reliable access through direct, private connectivity. Scalability: Build applications that scale while maintaining secure, seamless data access. Imagine a financial services company that needs to run complex risk analysis across multiple data sources, including live transactional databases and archived records. With MongoDB Atlas Data Federation and Azure Private Link, the company can securely query and aggregate this data without exposing it to the public internet, helping it achieve compliance with strict regulatory standards. Similarly, an e-commerce company managing high volumes of customer orders and inventory updates can use MongoDB Atlas Online Archive to seamlessly move older transaction records to cost-effective storage—all while ensuring real-time analytics dashboards still have instant access to historical trends. With Azure Private Link, these applications benefit from secure, low-latency connections, enabling developers to focus on innovation instead of on managing complex networking and security policies. General availability of MongoDB Atlas Data Federation and Atlas Online Archive on Google Cloud Developers working with Google Cloud can now use MongoDB Atlas Data Federation and Atlas Online Archive, which are now generally available in GA. This empowers developers to: Query data across sources: Run a single query across live databases, cloud storage, and data lakes without complex extract, transform, and load (ETL) pipelines. Optimize storage costs: Automatically move infrequently accessed data to lower-cost storage while keeping it queryable with MongoDB Atlas Online Archive. Achieve multi-cloud flexibility: Run applications across Amazon Web Services (AWS), Azure, and Google Cloud without being locked in. For example, a media streaming service might store frequently accessed content metadata in a high-performance database while archiving older user activity logs in Google Cloud Storage. With MongoDB Atlas Data Federation, the streaming service can analyze both live and archived data in a single query, making it easier to surface personalized recommendations without complex ETL processes. For a healthcare analytics platform, keeping years’ worth of patient records in a primary database can be expensive. By using MongoDB Atlas Online Archive, the platform can automatically move older records to lower-cost storage—while still enabling fast access to historical patient data for research and reporting. These updates give developers more control over building and scaling in the cloud. Whether they need secure access on Azure or seamless querying and archiving on Google Cloud, MongoDB Atlas simplifies security, performance, and cost efficiency. These updates are now live! Log in to your MongoDB Atlas account to start exploring the possibilities today.
How Cognistx’s SQUARY AI is Redefining Information Access
In a world where information is abundant but often buried, finding precise answers can be tedious and time-consuming. People spend hours a week simply searching for the information they need. Cognistx, an applied AI startup and a member of the MongoDB for Startups program, is on a mission to eliminate this inefficiency. Through its flagship product, SQUARY AI, the company is building tools to make information retrieval faster, more reliable, and radically simpler. As Cognistx seeks to unlock the future of intuitive search with speed, accuracy, and innovation, MongoDB Atlas serves as a reliable backbone for the company’s data operations. A company journey: From bespoke AI projects to a market-ready solution Cognistx started its journey with a focus on developing custom AI solutions for clients. Over time, the company identified a common pain point across industries: the need for efficient, high-quality tools to extract actionable insights from large volumes of data. This realization led it to pivot toward a product-based approach, culminating in the development of SQUARY AI—a next-generation intelligent search platform. SQUARY AI’s first iteration was born out of a bespoke project. The goal was to build a smart search engine capable of extracting answers to open-ended questions across multiple predefined categories. Early on, the team incorporated features like source tracking to improve trustworthiness and support human-assisted reviews, ensuring that the AI’s answers could be verified and trusted. Seeing the broader potential of its technology, Cognistx began using advancements in natural language processing and machine learning, transforming its early work into a stand-alone product designed for diverse industries. The evolution of SQUARY AI: Using state-of-the-art large language models Cognistx initially deployed traditional machine learning approaches to power SQUARY AI’s search capabilities, such as conversation contextualization and multihop reasoning (the ability to combine information from multiple sources to form a more complete answer). Before the rise of large language models (LLMs), this was no small feat. Today, SQUARY AI incorporates state-of-the-art LLMs to elevate both speed and precision. The platform uses a combination of retrieval-augmented generation (RAG), custom text-cleaning methods, and advanced vector search techniques. MongoDB Atlas integrates seamlessly into this ecosystem. MongoDB Atlas Vector Search powers SQUARY AI’s advanced search capabilities and lays the groundwork for even faster and more accurate information retrieval. With MongoDB Atlas, the company can store vectorized data alongside the rest of its operational data. There’s no need to add a separate, stand-alone database to handle vector search. MongoDB Atlas serves as both the operational data store and vector data store. Cognistx offers multiple branches of SQUARY AI, including: SQUARY Chat: Designed for public-facing or intranet deployment, these website chatbots provide instant, 24/7 access to website content, eliminating the need for human agents. It also empowers website owners with searchable, preprocessed AI insights from user queries. These analytics enable organizations to directly address customer needs, refine marketing strategies, and ensure that their sites contain the most relevant and valuable information for their audiences. SQUARY Enterprise: Built with businesses in mind, this enterprise platform helps companies retrieve precise answers from vast and unorganized knowledge bases. Whether it’s assisting employees or streamlining review processes, this tool helps organizations save time, improve team efficiency, and deliver actionable insights. One of the standout features of SQUARY AI is it's AI-driven metrics that assess system performance and provide insights into user interests and requirements. This is particularly valuable for public-facing website chatbots. A powerful database: How MongoDB powers SQUARY AI Cognistx attributes much of its technical success to MongoDB. The company’s history with MongoDB spans years, and its trust in MongoDB’s performance and reliability made the database the obvious choice for powering SQUARY AI. “MongoDB has been pivotal in our journey,” said Cognistx Data Scientist Ihor Markevych. “The scalable, easy-to-use database has allowed us to focus on innovating and refining SQUARY AI without worrying about infrastructure constraints. With MongoDB’s support, we’ve been able to confidently scale as our product grows, ensuring both performance and reliability.” The team’s focus when selecting a database was on cost, convenience, and development effort. MongoDB checked all those boxes, said Markevych. The company’s expertise with MongoDB, coupled with years of consistent satisfaction with its performance, made it the obvious choice. With no additional ramp-up effort necessary, the team was able to deploy very quickly. In addition to MongoDB Atlas Vector Search, the other critical feature of MongoDB is its scalability, which Markevych described as seamless. “Its intuitive structure enables us to monitor usage patterns closely and scale up or down as needed. This flexibility ensures we’re always operating efficiently without overcommitting resources,” Markevych said. The MongoDB for Startups program has also been instrumental in the company’s success. The program provides early-stage startups with free MongoDB Atlas credits, technical guidance, co-marketing opportunities, and access to a network of partners. With help from MongoDB technical advisors, the Cognistx team is now confidently migrating data from OpenSearch to MongoDB Atlas to achieve better performance at a reduced cost. The free MongoDB Atlas credits enabled the team to experiment with various configurations to optimize the product further. It also gained access to a large network of like-minded innovators. “The MongoDB for Startups community has provided invaluable networking opportunities, enhancing our visibility and connections within the industry,” Markevych said. The future: Scaling for more projects Looking ahead, Cognistx is focusing on making SQUARY AI even more accessible and customizable. Key projects include automating the onboarding process, which will enable users to define and fine-tune system behavior from the start. The company also aims to expand SQUARY AI’s availability across various marketplaces. With a successful launch on AWS Marketplace, the company next hopes to offer its product on WordPress, making it simple for businesses to integrate SQUARY Chat into their websites. Cognistx is continuing to refine SQUARY AI’s balance between speed, accuracy, and usability. By blending cutting-edge technologies with a user-centric approach, the company is shaping the future of how people access and interact with information. See it in action Cognistx isn’t just building a tool; it’s building a movement toward intuitive, efficient, and conversational search. Experience the possibilities for yourself— schedule a demo of SQUARY AI today . To get started with vector search in MongoDB, visit our MongoDB Atlas Vector Search Quick Start guide .
Embracing Open Finance Innovation with MongoDB
The term "open finance" is increasingly a topic of discussion among banks, fintechs, and other financial services providers—and for good reason. Open finance, as the next stage of open banking, expands the scope of data sharing beyond traditional banking to include investments, insurance, pension funds, and more. To deliver these enhanced capabilities, financial service providers need a versatile and flexible data store that can seamlessly manage a wide array of financial data. MongoDB serves as an ideal solution, providing a unified data platform that empowers financial services providers to integrate various data sources, enabling real-time analytics, efficient data retrieval, and scalability. These capabilities are pivotal in enhancing customer experiences, providing users with a comprehensive view of their finances, and empowering them with greater visibility and control over their own data. By adopting MongoDB, financial services can seamlessly adapt to the growing demands of open finance and deliver innovative, data-driven solutions. Open finance's past and future As highlighted in a study conducted by the Cambridge Centre for Alternative Finance 1 , the terms 'open banking' and 'open finance' vary globally. Acknowledging these differences, we'll focus on the model displayed in Figure 1 due to its widespread adoption and relevance in our study. Figure 1. The three waves of innovation in financial services. The development of open finance started with open banking, which intended for banks to promote innovation by allowing customers to share their financial data with third-party service providers (TPP) and allow those TPP—fintech and techfin companies—to initiate transactions on their behalf solely in the context of payments. This proved to be an effective way to promote innovation and thus led to a broader spectrum of financial products adding loans, mortgages, savings, pensions, insurance, investments, and more. Leading to this new directive, commonly referred to as: open finance. If we take a step further—regardless of its final implementation—a third development called open data suggests sharing data beyond the traditional boundaries of the financial services industry (FSI), exponentially increasing the potential for financial services by moving into cross-sector offerings, positioning FSI as a horizontal industry rather than an independent vertical as it was previously known. Who and what plays a role in open finance? Among the different actors across open finance, the most important are: Consumers: End-users empowered to grant or revoke consent to share their data primarily through digital channels. Data holders: These are mainly financial services companies, and thereby consumer data custodians. They are responsible for controlling the data flow across the different third-party providers (TPPs). Data users: Data users are common third-party providers offering their services based on consumers’ data (upon request/consent). Connectivity providers: Trusted intermediaries that facilitate data flow, also known as TSPs in the EU and UK, and Account Aggregators in India. Regulatory authorities: Set standards, oversee processes, and may intervene in open finance implementation. They may vary according to the governance type. The interactions between all these different parties define the pillars for open finance functioning: Technology: Ensures secure data storage and the exposure-consumption of services. Standards: Establishes frameworks for data interchange schemas. Regulations and enforceability: Encompasses security policies and data access controls. Participation and trust: Enables traceability and reliability within a regulated ecosystem. Figure 2. High-level explanation of data sharing in open finance. Drivers behind open finance: Adoption, impact, and compliance Open finance seeks to stimulate innovation by promoting competition, safeguarding consumer privacy, and ensuring market stability—ultimately leading to economic growth. Additionally, it has the potential to provide financial institutions with greater access to data and better insights into consumers' preferences, allowing them to tailor their offerings and to enhance user experiences. This data sharing between the ecosystem’s participants requires a regulated set of rules to ensure data protection, security, and compliance according to each jurisdiction. As seen in Figure 3 below, there are two broad drivers of open finance adoption: regulation-led and market-driven adoption. Whether organizations adopt open finance depends on factors like market dynamics, digital readiness, and regulatory environment. Figure 3. An illustrative example of open finance ecosystem maturity. Even though there is not one single, official legal framework specifying how to comply with open finance, countries around the world have crafted their own specific set of norms as guiding principles. Recent market research reports reveal how several countries are already implementing open finance solutions, each coming from different starting points, with their own economic goals and policy objectives. In Europe, the Revised Payment Services Directive (PSD2) combined with the General Data Protection Regulation (GDPR) form the cornerstone of the regulatory framework. The European Commission published a proposal in June 2023 for a regulation on a framework for Financial Data Access 2 (FiDA) set to go live in 2027. 3 In the UK, open finance emerged from the need to address the market power held by a few dominant banks. In India, open finance emerged as a solution to promote financial inclusion by enabling identity verification for accounts opening through the national ID system. The aim is to create a single European data space – a genuine single market for data, open to data from across the world – where personal as well as non-personal data, including sensitive business data, are secure and businesses also have easy access to an almost infinite amount of high-quality industrial data, boosting growth and creating value, while minimising the human carbon and environmental footprint. 4 Build vs. buy: Choosing the right open finance strategy One of the biggest strategic decisions financial institutions face is whether to build their own open finance solutions in-house or buy from third-party open finance service providers. Both approaches come with trade-offs: Building in-house provides full ownership, flexibility, and control over security and compliance. While it requires significant investment in infrastructure, talent, and ongoing maintenance, it ensures lower total cost of ownership (TCO) in the long run, avoids vendor lock-in, and offers complete traceability—reducing reliance on external providers and eliminating “black box” risks. Institutions that build their own solutions also benefit from customization to fit specific business needs and evolving regulations. Buying from a provider accelerates time to market and reduces development costs while ensuring compliance with industry standards. However, it introduces potential challenges such as vendor lock-in, limited customization, and integration complexities with existing systems. For financial institutions that prioritize long-term cost efficiency, compliance control, and adaptability, the building approach offers a strategic advantage—though it comes with its own set of challenges. What are the challenges and why do they matter? As open finance continues to evolve, it brings significant opportunities for innovation—but also introduces key challenges that financial institutions and fintech companies must navigate. These challenges impact efficiency, security, and compliance, ultimately influencing how quickly new financial products and services can reach the market. 1. Integration of data from various sources Open finance relies on aggregating data from multiple institutions, each with different systems, APIs, and data formats. This complexity leads to operational inefficiencies, increased latency, and higher costs associated with data processing and infrastructure maintenance. Without seamless integration, financial services struggle to provide real-time insights and a frictionless user experience. 2. Diverse data types Financial data comes in various formats—structured, semi-structured, and unstructured—which creates integration challenges. Many legacy systems operate with rigid schemas that don’t adapt well to evolving data needs, making it difficult to manage new financial products, regulations, and customer demands. Without flexible data structures, innovation is slowed, and interoperability between systems becomes a persistent issue. 3. Data security With open finance, vast amounts of sensitive customer data are shared across multiple platforms, increasing the risk of breaches and cyberattacks. A single vulnerability in the ecosystem can lead to data leaks, fraud, and identity theft, eroding customer trust. Security vulnerabilities have financial consequences and can result in legal examination and long-term reputational damage. 4. Regulatory compliance Navigating a complex and evolving regulatory landscape is a major challenge for open finance players. Compliance with data protection laws, financial regulations, and industry standards—such as GDPR or PSD2—requires constant updates to systems and processes. Failure to comply can lead to legal penalties, substantial fines, and loss of credibility—making it difficult for institutions to operate confidently in a global financial ecosystem. These challenges directly impact the ability of financial institutions to innovate and launch new products quickly. Integration issues, security concerns, and regulatory complexities contribute to longer development cycles, operational inefficiencies, and increased costs—ultimately slowing the time to market for new financial services. In a highly competitive industry where speed and adaptability are critical, overcoming these challenges is essential for success in open finance. MongoDB as the open finance data store To overcome open finance’s challenges, a flexible, scalable, secure, and high-performing data store is required. MongoDB is an ideal solution, as it offers a modern, developer-friendly data platform that accelerates innovation while meeting the critical demands of financial applications. Seamless integration with RESTful JSON APIs According to OpenID’s 2022 research , most open finance ecosystems adopt RESTful JSON APIs as the standard for data exchange, ensuring interoperability across financial institutions, third-party providers, and regulatory bodies. MongoDB’s document-based model natively supports JSON, making it the perfect backend for open banking APIs. This enables financial institutions to ingest, store, and process API data efficiently while ensuring compatibility with existing and emerging industry standards. Flexible data model for seamless integration Open finance relies on diverse data types from multiple sources, each with different schemas. Traditional relational databases require rigid schema migrations, often causing downtime and disrupting high-availability services. MongoDB's document-based model—with its flexible schema—offers an easy, intuitive, and developer-friendly solution that eliminates bottlenecks, allowing financial institutions to adapt data structures dynamically, all without costly migrations or downtime. This ensures seamless integration of structured, semi-structured, and unstructured data, increasing productivity and performance while being cost-effective, enables faster iteration, reduced complexity, and continuous scalability. Enterprise-grade security and compliance Security and compliance are non-negotiable requirements in open finance, where financial data must be protected against breaches and unauthorized access. MongoDB provides built-in security controls, including encryption, role-based access controls, and auditing. It seamlessly integrates with existing security protocols and compliance standards, ensuring adherence to regulations such as GDPR and PSD2. MongoDB also enforces privileged access controls and continuous monitoring to safeguard sensitive data, as outlined in the MongoDB Trust Center . Reliability and transactional consistency Financial applications demand zero downtime and high availability, especially when processing transactions and real-time financial data. MongoDB’s replica sets ensure continuous availability, while its support for ACID transactions guarantees data integrity and consistency—critical for handling sensitive financial operations such as payments, lending, and regulatory reporting. The future of open finance The evolution of open finance is reshaping the financial industry, enabling seamless data-sharing while introducing new challenges in security, compliance, and interoperability. As financial institutions, fintechs, and regulators navigate this shift, the focus remains on balancing innovation with risk management to build a more inclusive and efficient financial ecosystem. For organizations looking to stay ahead in this landscape, choosing the right technology stack is crucial. MongoDB provides the flexibility, scalability, and security needed to power the next generation of open finance applications—helping financial institutions accelerate innovation while ensuring compliance and data integrity. In Part 2 of our look at open finance, we’ll explore a demo from the Industry Solutions team that leverages MongoDB to implement an open finance strategy that enhances customer experience, streamlines operations, and drives financial accessibility. Stay tuned! Head over to our GitHub repo to view the demo. Visit our solutions page to learn more about how MongoDB can support financial services. 1 CCAF, The Global State of Open Banking and Open Finance (Cambridge: Cambridge Centre for Alternative Finance, Cambridge Judge Business School, University of Cambridge, 2024). 2 “The Financial Data Access (FiDA) Regulation,” financial-data-access.com, 2024, https://www.financial-data-access.com/ 3 Maout, Thierry, “What is Financial Data Access (FiDA), and how to get ready?”, July 16th, 2024, https://www.didomi.io/blog/financial-data-access-fida?315c2b35_page=2 4 European Commission (2020), COMMUNICATION FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COMMITTEE AND THE COMMITTEE OF THE REGIONS, EUR-Lex.
MongoDB Atlas Expands Cloud Availability to Mexico
¡MongoDB ama a Mexico! The company’s second-largest market in Latin America, Mexico is also one of MongoDB’s top 20 markets globally. With rapid customer adoption across the country, we’re doubling down on our commitment to Mexico—investing in the resources and support our customers need to build and scale with MongoDB. That’s why I’m so thrilled to announce that MongoDB Atlas , MongoDB’s modern, cloud-native database, is now available on Amazon Web Services (AWS), Google Cloud, and Microsoft Azure cloud infrastructure regions in Mexico. MongoDB Atlas is the most widely available data platform in the world and the only true multi-cloud database on the market. With availability in over 125 cloud regions globally, customers can deploy applications on their choice of cloud, across cloud regions, or across multiple cloud providers, which provides them with the flexibility to move data seamlessly between cloud providers and to utilize the unique services of each cloud provider simultaneously. Innovating faster with MongoDB Atlas Until now, customers in Mexico have used MongoDB Enterprise Advanced for on-premise deployments or they’ve run Atlas deployments on cloud infrastructure regions outside of Mexico. But for customers in highly regulated industries—or for those with highly sensitive workloads who are required to keep their data in-country—modernizing their applications in the cloud with the three major cloud providers wasn’t an option. MongoDB Atlas streamlines and secures enterprises’ data infrastructure by integrating a globally distributed database with built-in search, analytics, and AI-ready capabilities. By eliminating the need for single-purpose databases and complex data pipelines, Atlas will help organizations in Mexico modernize faster, simplify operations, and stay competitive in an AI-driven world. Bottom line: MongoDB was built for change, and its flexibility empowers businesses to innovate with next-generation technologies like AI at the breakneck speed of the market. Now that MongoDB Atlas is available on all three major cloud providers’ local infrastructure regions, more of our customers across Mexico can begin modernizing enterprise-grade applications in the cloud with confidence. Around the world, tens of thousands of customers are innovating faster in the cloud thanks to MongoDB Atlas. For instance, Bendigo and Adelaide Bank recently partnered with MongoDB to modernize its core banking technology with MongoDB Atlas as the keystone of an ambitious application modernization initiative. As part of the initiative, the bank reduced the development time required to migrate a core banking application off of a legacy relational database to MongoDB Atlas by up to 90% and at one-tenth the cost of a traditional legacy-to-cloud migration. Investing in Mexico In recent years, MongoDB has seen significant customer growth across Mexico. Today, more than 35,000 developers in Mexico list MongoDB as a skill on LinkedIn, and MongoDB has seen meaningful adoption across the financial services, retail, telecommunications, and software development sectors. Our team has been dedicated to helping these customers modernize their applications and accelerate their digital transformation. A key driver of this success is MongoDB’s deep partnerships with AWS, Google Cloud, and Microsoft Azure. We continue to expand our integrations with services like Amazon Bedrock, Gemini Code Assist, and Azure AI Foundry that empower Mexican customers to build intelligent applications more quickly and with less friction, while our recognition as a Partner of the Year by each of these cloud providers is a testament to the impact of our collaboration and the success of our joint customers. Since opening MongoDB’s Mexico City office in 2016, we’ve expanded rapidly, and have hired approximately 50 employees over the past three years to support rising demand and enhance every stage of the customer journey. As our presence has grown, Mexico has become MongoDB’s corporate headquarters for all Spanish-speaking countries in LATAM. Notably, it is now our second-largest market in the region, with a total addressable market exceeding that of our third, fourth, and fifth-largest LATAM markets combined. To further support our expanding customer base, we are committed to continued investment in our local team through 2025. Mexico boasts an exceptional talent pool, and we are actively growing our organization across key areas, including sales, customer success, partners, solutions architecture, and professional services. With this expansion, we’re ensuring that businesses in Mexico have not only the most powerful tools to innovate but also the local expertise and partnerships to accelerate their success with confidence. If you’re looking for your next adventure, take a look at our open roles in Mexico. Connect with us This spring, MongoDB.local is coming to Mexico City! MongoDB.local is a global series of in-person events, bringing together developers, IT professionals, and technology enthusiasts to explore the latest in MongoDB. Through expert-led talks, hands-on workshops, and networking opportunities, you’ll gain valuable insights and practical skills—all in an engaging and collaborative environment. Join us in Mexico City on May 15 to connect with the community and take your MongoDB expertise to the next level.
Innovating with MongoDB | Customer Successes, March 2025
Hello and welcome! This is the first installment of a new bi-monthly blog series showcasing how companies around the world are using MongoDB to tackle mission-critical challenges. As the leading database for modern applications, MongoDB empowers thousands of organizations to harness the power of their data and to drive creativity and efficiency across industries. This series will shine a light on some of those amazing stories. From nimble startups to large enterprises, our customers are transforming data management, analytics, and application development with MongoDB's flexible schema, scalability, and robust cloud services. What do I mean? Picture retailers like Rent the Runway improving customer experiences with real-time analytics, fintech companies such as Koibanx speeding up and securing transaction processes, and healthcare companies like Novo Nordisk optimizing the path to regulatory approvals. With MongoDB, every developer and organization can fully tap into the potential of their most valuable resource: their data. So please read on—and stay tuned for more in this blog series!—to learn about the ingenuity of the MongoDB customer community, and how they’re pushing the boundaries of what's possible. Lombard Odier Lombard Odier , a Swiss bank with a legacy dating back to 1796, transformed its application architecture with MongoDB to stay at the forefront of financial innovation. Confronted with the challenge of modernizing its systems amidst rapid digital and AI advancements, the bank leveraged MongoDB’s Application Modernization Factory and generative AI to streamline its application upgrades. This initiative resulted in up to 60x faster migration of simple code and slashed regression testing from three days to just three hours. By transitioning over 250 applications to MongoDB, including its pivotal portfolio management system, Lombard Odier significantly reduced technical complexity and empowered its developers to focus on next-generation technologies. SonyLIV SonyLIV faced challenges with its over-the-top (OTT) video-streaming platform. Their legacy relational database had poor searchability, complex maintenance, and slow content updates. Critically, it lacked the scalability necessary to support 1.6 million simultaneous users. To power their new CMS— ‘Blitz’—SonyLIV selected MongoDB Atlas’s flexible document model to improve performance and lower search query latency by 98%. Collaborating with MongoDB Professional Services , SonyLIV optimized API latency using MongoDB Atlas Search and Atlas Online Archive , effectively managing over 500,000 content items and real-time updates. With their new high-performing, modern solution in place, SonyLIV can now deliver flawless customer experiences to the world, faster. Swisscom Swisscom , Switzerland's leading telecom and IT service provider, harnessed MongoDB to enrich its banking sector insights with AI. Faced with the challenge of streamlining access to its extensive library of over 3,500 documents, Swisscom utilized MongoDB Atlas and MongoDB Atlas Vector Search capabilities to transform unstructured data into precise, relevant content summaries in seconds. In just four months, Swisscom launched a production-ready platform with improved relevance, concrete answers, and transparency. The project sets a new standard in Swiss banking, and showcases Swisscom's commitment to driving the digital future with advanced AI solutions. Victoria’s Secret Victoria's Secret’s e-commerce platform processes thousands of transactions daily across over 2.5 billion documents on hundreds of on-premises databases. Experiencing high costs and operational constraints with its monolithic architecture, the retailer initially adopted CouchDB but faced challenges like data duplication and limited functionality. In 2023, Victoria's Secret migrated to MongoDB Atlas on Azure , achieving zero downtime while optimizing performance and scalability. Over four months, they successfully migrated more than four terabytes of data across 200 databases, reducing CPU core usage by 75% and achieving a 240% improvement in API performance. The move to MongoDB also allowed the retailer to introduce additional products, like MongoDB Atlas Vector Search, resulting in significant operational efficiencies and cost savings. Video spotlight Before you go, be sure to watch one of our recent customer videos featuring the Danish pharmaceutical giant, Novo Nordisk . Discover how Novo Nordisk leveraged MongoDB and GenAI to reduce the time it takes to produce a Clinical Study Report (CSR) from 12 weeks to 10 minutes.. Want to get inspired by your peers and discover all the ways we empower businesses to innovate for the future? Visit our Customer Success Stories hub to see why these customers, and so many more, build modern applications with MongoDB.
Modernizing Telecom Legacy Applications with MongoDB
The telecommunications industry is currently undergoing a profound transformation, fueled by innovations in 5G networks, the growth of Internet of Things applications, and the rapid rise of AI. To capitalize on these technologies, companies must effectively handle increasing volumes of unstructured data, which now represents up to 90% of all information, while also developing modern applications that are flexible, high-performance, and scalable. However, the telecommunications industry's traditional reliance on relational databases such as PostgreSQL presents a challenge to modernization. Their rigid structures limit adaptability and can lead to decreased performance as table complexity grows. With this in mind, this blog post explores how telecom companies can modernize their legacy applications by leveraging MongoDB’s modern database and its document model. With MongoDB, telecom companies can take advantage of the latest industry innovations while freeing their developers from the burdens of maintaining legacy systems. Navigating legacy system challenges Legacy modernization refers to the process of updating a company’s IT infrastructure to align it with the latest technologies and workflows, and ultimately advancing and securing strategic business goals. For telecom companies, this modernization involves overcoming the limitations of their legacy systems, which hinder adjustment to changing market conditions that demand greater system scalability and availability to run real-time operations. The main drawbacks of legacy technologies like relational databases stem from their design, which wasn’t built to support the data processing capabilities required for modern telecom services. These limitations, as illustrated in Figure 1 below, include rigid data schemas, difficulty handling complex data formats, limited scaling ability, and higher operational costs for maintenance. Figure 1. The limitations of legacy systems. Expanding on these limitations, relational databases depend on a predefined schema, which becomes difficult to modify once established, as changes entail extensive restructuring efforts. In telecommunications, handling growing data volumes from connected devices and 5G networks can rapidly become burdensome and costly due to frequent CPU, storage, and RAM upgrades. Over time, technology lock-in can further escalate costs by hindering the transition to alternative solutions. Altogether, these factors hold back modernization efforts urging telecoms to transform their legacy systems to newer technologies. To overcome these challenges, telecom companies are replacing these legacy systems with modern applications that effectively provide them with greater scalability, enhanced security, and high availability, as shown in Figure 2. However, achieving this transition can be a daunting task for some organizations due to the complexity of current systems, a lack of internal technical expertise, and the hurdles of avoiding downtime. Therefore, before transforming their outdated systems, telecom companies must carefully select the appropriate technologies and formulate a modernization strategy to facilitate this transition. Figure 2. Characteristics of modern applications. Getting onboard with MongoDB Enter MongoDB. The company’s document-oriented database offers a flexible data model that processes any information format, easily adapting to specific application requirements. MongoDB Atlas —MongoDB’s unified, modern database—delivers a robust cloud environment that efficiently manages growing data volumes through its distributed architecture, ensuring seamless connectivity and enhanced performance. Moreover, as telecom providers prioritize cybersecurity and innovation, MongoDB includes robust security measures—comprising encryption, authentication, authorization, and auditing—to effectively protect sensitive information and ensure regulatory compliance. Additionally, leveraging MongoDB’s document model with built-in Atlas services like Vector Search , Atlas Charts , and Stream Processing allows telecommunications organizations to streamline advanced industry cases, including single customer view, AI integrations, and real-time analytics. Figure 3. Core MongoDB modernization features for Modernization. Recognizing these benefits, leading telecom companies like Nokia , Swisscom , and Vodafone have successfully modernized their applications with MongoDB. However, selecting the right technology is only part of the modernization process. In order to ensure a successful and effective modernization project, organizations should establish a comprehensive modernization strategy. This process typically follows one of three following paths: Data-driven modernization: this approach transfers all data from the legacy system to the new environment and then migrates applications. Application-driven modernization (all-or-nothing): this approach executes all reads and writes for new applications in the new data environment from the start, but leaves the business to decide when to retire existing legacy applications. Iterative modernization (one-step-at-a-time): this approach blends the previous paths, starting with the modernization of the least complex applications and incrementally moving forward into more complex applications. Read this customer story to learn more about telecoms migrating to MongoDB. With this overview complete, let's dive into the migration process by examining the iterative modernization of a telecom billing system. Modernizing a telecom billing system Telecom billing systems often consist of siloed application stacks segmented by product lines like mobile, cable, and streaming services. This segmentation leads to inefficiencies and overly complex architectures, highlighting the need to simplify these structures. With this in mind, imagine a telecom company that has decided to modernize its entire billing system to boost performance and reduce complexity. In the initial stage, telecom developers can assess the scope of the modernization project, scoring individual applications based on technical sustainability and organizational priorities. Applications with high scores undergo further analysis to estimate the re-platforming effort required. Later on, a cross-functional team selects the first component to migrate to MongoDB, initiating the billing system modernization. This journey then follows the steps outlined in Figure 4: Figure 4. The modernization process. First, developers analyze legacy systems by examining the codebase and the underlying architecture of the chosen billing system. Then, developers create end-to-end tests to ensure the application functions correctly when deployed. Later, developers design an architecture that incorporates managerial expectations of the desired application. Next, developers rewrite and recode the legacy application to align with the document model and develop APIs for MongoDB interaction. Following this, developers conduct user tests to identify and resolve any existing application bugs. Finally, developers migrate and deploy the modernized application in MongoDB, ensuring full functionality. Throughout this process, developers can leverage MongoDB Relational Migrator to streamline the transition. Relational Migrator helps developers with data mapping and modeling, SQL object conversion, application code generation, and data migration—corresponding to steps three, four, and five. Additionally, telecom companies can accelerate modernization initiatives by leveraging MongoDB Professional Services for dedicated, tailored end-to-end migration support. Our experts work closely with you to provide customized assistance, from targeted technical support and development resources to strategic guidance throughout the entire project. Building on this initial project, telecom companies can progressively address more complex applications, refining their approach to support a long-term modernization strategy. Next steps By revamping legacy applications with MongoDB, telecom companies can improve their operations and gain a competitive edge with advanced technology. This shift allows telcos to apply the latest innovations and free developers from the burdens of maintaining legacy systems. Start your journey to migrate core telecom applications to MongoDB Atlas, by visiting our telecommunications solutions page to learn more. If you would like to discover how to upgrade your TELCO legacy systems with MongoDB, discover how to start with the following resources: Visit our professional services to learn more about MongoDB Consulting YouTube: Relational Migrator Explained in 3 minutes White paper: Unleash Telco Transformation with an Operational Data Layer White paper: Modernization: What’s Taking So Long?
Building Gen AI with MongoDB & AI Partners | February 2025
February was big for MongoDB—and, more importantly, for anyone looking to build AI applications that deliver highly accurate, relevant information (in other words, for everyone building AI apps). MongoDB announced the acquisition of Voyage AI , a pioneer in state-of-the-art embedding and reranking models that power next-generation AI applications. Because generative AI is by nature probabilistic, models can “hallucinate”, and generate false or misleading information. This can lead to serious risks, especially in cases or industries (e.g., financial services) where accurate information is paramount. To address this, organizations building AI apps need high-quality retrieval; they need to trust that the most relevant information is extracted from their data with precision. Voyage AI’s advanced embedding and reranking models enable applications to extract meaning from highly specialized and domain-specific text and unstructured data. With roots at Stanford and MIT, Voyage AI’s world-class team is trusted by AI innovators like Anthropic, LangChain, Harvey, and Replit. Integrating Voyage AI’s technology with MongoDB will enable organizations to easily build trustworthy, AI-powered applications by offering highly accurate and relevant information retrieval deeply integrated with operational data. For more, check out MongoDB CEO Dev Ittycheria’s blog post about Voyage AI , and what this means for developers and businesses (in short, delivering high-quality results at scale). Onward! P.S. If you’re in Vegas for HumanX this week, stop by booth 412 to say hi to MongoDB! Welcoming new AI and tech partners The Voyage AI news was hardly the only exciting development last month. In February 2025, MongoDB welcomed three new AI and tech partners that offer product integrations with MongoDB. Read on to learn more about each great new partner! CopilotKit Seattle-based CopilotKit provides open source infrastructure for in-app AI copilots. CopilotKit helps organizations build production-ready copilots and agents effortlessly. “We’re excited to be partnering with MongoDB to help companies build best-in-class copilots that leverage RAG & take action based on internal data,” said Uli Barkai, Co-Founder and Chief Marketing Officer at CopilotKit. “MongoDB made it dead simple to build a scalable vector database with operational data. This collaboration enables developers to easily ship production-grade RAG applications.” Varonis Varonis is the leader in data security, protecting data wherever it lives—across SaaS, IaaS, and hybrid cloud environments. Varonis’ cloud-native Data Security Platform continuously discovers and classifies critical data, removes exposures, and detects advanced threats with AI-powered automation. “Varonis’s mission is to protect data wherever it lives,” said David Bass, Executive Vice President of Engineering and Chief Technology Officer at Varonis. “We are thrilled to further advance our mission by offering AI-powered data security and compliance for MongoDB, the database of choice for high-performance application and AI development. With this integration, joint customers can automatically discover and classify sensitive data, detect abnormal activities, secure AI data pipelines, and prevent data leaks.” Xlrt Xlrt is an automated insight-generation platform that enables financial institutions to create innovative financial credit products at scale by simplifying the financial spreading process. “We are excited to partner with MongoDB Atlas to transform AI-driven financial workflows,” said Rupesh Chaudhuri, Chief Operating Officer and Co-Founder of Xlrt. “XLRT.ai leverages agentic AI, combining graph-based contextualization, vector search, and LLMs to redefine data-driven decision-making. With MongoDB's robust NoSQL and vector search capabilities, we’re delivering unparalleled efficiency, accuracy, and scalability in automating financial processes.” To learn more about building AI-powered apps with MongoDB, check out our AI Learning Hub and stop by our Partner Ecosystem Catalog to read about our integrations with MongoDB’s ever-evolving AI partner ecosystem. And visit the MongoDB AI Applications Program (MAAP) page to learn how MongoDB and the MAAP ecosystem helps organizations build applications with advanced AI capabilities.
ORiGAMi: A Machine Learning Architecture for the Document Model
The document model has proven to be the optimal paradigm for modern application schemas. At MongoDB, we've long understood that semi-structured data formats like JSON offer superior expressiveness compared to traditional tabular and relational representations. Their flexible schema accommodates dynamic and nested data structures, naturally representing complex relationships between data entities. However, the machine learning (ML) community has faced persistent challenges when working with semi-structured formats. Traditional ML algorithms, as implemented in popular libraries like scikit-learn and pandas , operate on the assumption of fixed-dimensional tabular data consisting of rows and columns. This fundamental mismatch forces data scientists to manually convert JSON documents into tabular form—a time-consuming process that requires significant domain expertise. Recent advances in natural language processing (NLP) demonstrate the power of Transformers in learning from unstructured data but their application to semi-structured data has been under-studied. To bridge this gap, MongoDB's ML research group has developed a novel Transformer-based architecture designed for supervised learning on semi-structured data (e.g., JSON data in a document model database). We call this new architecture ORiGAMi (Object Representation through Generative, Autoregressive Modelling), and we're excited to make it available to the community at github.com/mongodb-labs/origami . It includes components that make training a Transformer model feasible on datasets entailing as few as 200 labeled samples. By combining this data efficiency with the flexibility of Transformers, ORiGAMi enables prediction directly from semi-structured documents, without the cumbersome flattening and manual feature extraction required for tabular data representation. You can read more about our model on arXiv . Technical innovation The key insight behind ORiGAMi lies in its tokenization strategy: documents are transformed into sequences of key-value pairs and special structural tokens that encode nested types like arrays and subdocuments: These token sequences serve as input to the Transformer model trained to predict the next token given a portion of the document, similar to how large language models (LLMs) are trained on text tokens. What’s more, our modifications to the standard Transformer architecture include guardrails to ensure that the model only generates valid, well-formed documents, and a novel position encoding strategy that respects the order invariance of key/value pairs in JSON. These modifications also allow for much smaller models compared to LLMs, which can thus be trained on consumer hardware in minutes to hours depending on dataset size and complexity, versus days to weeks for LLMs. By reformulating classification as a next-token prediction task, ORiGAMi can predict any field within a document, including complex types like arrays and nested subdocuments. This unified approach eliminates the need for separate models or preprocessing pipelines for different prediction tasks. Example use case Our initial focus has been supervised learning: training models from labeled data to make predictions on unseen documents. Let's explore a practical example of user segmentation. Consider a collection where each document represents a user profile, containing both simple fields and complex nested structures: { "_id": "user_7842", "email": "sarah.chen@example.com", "signup_date": "2024-01-15", "device_history": [ { "device": "mobile_ios", "first_seen": "2024-01-15", "last_seen": "2024-02-11" }, { "device": "desktop_chrome", "first_seen": "2024-01-16", "last_seen": "2024-02-10" } ], "subscription": { "plan": "pro", "billing_cycle": "annual", "features_used": ["analytics", "api_access", "team_sharing"], "usage_metrics": { "storage_gb": 45.2, "api_calls_per_day": 1250, "active_projects": 8 } }, "user_segment": "enterprise_power_user" // <-- target field } Suppose you want to automatically classify users into segments like "enterprise_power_user", "smb_growth", or "early_stage_startup" based on their behavior and characteristics. Some documents in your collection already have correct labels, perhaps assigned through manual analysis or customer interviews. Traditional ML approaches would require flattening this rich document structure, leading to very sparse tables and potentially losing important hierarchical relationships. With ORiGAMi, you can: Train directly on the raw documents with existing labels Preserve the full context of nested structures and arrays Make predictions for the "user_segment" field on new users immediately after signup Update predictions as user behavior evolves without rebuilding feature pipelines Getting started with ORiGAMi We're excited to be open-sourcing ORiGAMi ( github.com/mongodb-labs/origami ) and you can read more about our model on arXiv . We've also included a command-line interface that lets users make predictions without writing any code. Training a model is as simple as pointing ORiGAMi to your MongoDB collection: origami train <mongo-uri> -d app -c users Once trained, you can generate predictions and seamlessly integrate them back into your MongoDB workflow. For example, to predict user segments for new signups (from the analytics.signups collection ) and write the resulting predictions back to MongoDB to an analytics.predicted collection: origami predict <mongo-uri> -d analytics -c signups --target user_segment --json | mongoimport -d analytics -c predicted For those looking to dive deeper, we've also included several Jupyter notebooks in the repository that demonstrate advanced features and customization options. Model performance can be improved by adjusting the hyperparameters. We're just scratching the surface of what's possible with document-native machine learning, and have many more use cases in mind. We invite you to explore the repository, contribute to the project, and share how you use ORiGAMi to solve real-world problems. Head over to the ORiGAMi github repo , play around with it, and tell us about new ways of applying it and problems it’s well-suited to solving.
ZEE5: A Masterclass in Migrating Microservices to MongoDB Atlas
ZEE5 is a leading Indian over-the-top (OTT) video-streaming platform that delivers streamed content via Internet-connected devices. The platform offers a wide variety of content—movies, TV shows, web series, and original programming—across multiple genres and languages. Owned by Zee Entertainment Enterprises Limited , ZEE5 produces over 260 hours of content daily, with a monthly active user base of more than 119.5 million users across 190 countries. ZEE5’s operations and customer satisfaction are dependent on its backend infrastructure being robust and scalable to handle immense traffic and complex workflows. In order to future-proof its infrastructure and to maintain its competitive edge, the company needed to streamline operations and enhance its database management capabilities. This included the migration of its entire OTT platform, including a total of 100+ microservices and 80+ databases to Google Cloud. Pramod Prakash, Senior Vice President of Engineering at ZEE5, was on the stage of MongoDB.local Bangalore in 2024 . He shared insights into how ZEE5 managed this migration without hindering performance or disrupting its services. “It was a massive project which required a very carefully orchestrated migration plan,” said Prakash. Massive migration, zero downtime: Challenge accepted ZEE5’s team embarked on an ambitious journey to migrate a total of 40+ microservices (out of its 100+ microservices) to MongoDB Atlas . These were previously running on the Community Edition of MongoDB and on other NoSQL databases. One of the challenges of this migration was to ensure continuous data flow for the platform’s 119.5 million streaming users. To do so, Prakash and his team created multiple environments using a change data capture tool . This ensured continuous replication of data so the user experience would not be impacted. “We had to build four environments: dev, QA [Quality Assurance], UAT [User Acceptance Testing], and production,” explained Prakash. “We needed to keep testing and verifying each environment, and then finally enter the production phase when we migrated the data and moved the traffic.” The approach involved migrating production data twice: first for testing and then for the final cutover. This was to minimize any data loss. ZEE5 used MongoDB Atlas’ integrated tools mongosync and mongomirror . The tools helped achieve an essential goal: avoiding any downtime. “We migrated this entire mammoth application with zero downtime!” said Prakash. “We have not stopped ZEE5’s operations at all.” “The second important thing is the performance: you want to be 100% sure that the entire scale and peak traffic will work seamlessly within the new cloud environment,” added Prakash. ZEE5 relied on MongoDB Professional Services (PS)’s support. The PS team helped architect and plan the entire migration strategy. They also accompanied Prakash’s team step by step to ensure there would be no unexpected disruptions. The production environment was built and tested rigorously before the final migration to ensure seamless performance at peak traffic levels. “We iterated until we were 100% sure that the new environment was ready to take up ZEE5’s peak traffic. Functionally, it was all perfect,” said Prakash. The power of the Atlas platform According to Prakash, the power of MongoDB Atlas lies in the fact that it offers a fully managed platform. “There is no maintenance overhead at all,” he said. “All upgrades happen automatically without any downtime. We are also leveraging auto-scaling capabilities and point-in-time recovery.” All of this enables efficient handling of varying traffic loads without manual intervention. Additionally, data recovery capabilities are enhanced, and most importantly, the engineering team can prioritise application development rather than operational maintenance. As of February 2025, MongoDB Atlas supports a total of seven key use cases at ZEE5: payments, subscriptions, plans and coupons, video engineering, Zee Music (users’ preferences and playlists), content metadata, and the platform’s communication engine (SMS and email notifications). Looking ahead, ZEE5 is working on more use cases powered by MongoDB. For example, the company is looking to completely migrate their master data source for content metadata to MongoDB Atlas. ZEE5 is also considering relying on MongoDB Atlas to support and enhance its search and recommendations capabilities. Interested in learning how MongoDB is powering other companies applications? Head over to our customer case studies hub to read the latest stories. Visit our product page to learn more about MongoDB Atlas .