How to Model Data in Document Databases for Read and Write Performance

Steve Jurczak
October 20, 2022
#MongoDB World

MongoDB is often seen as a good choice for storing unstructured data. The ability to persist data in MongoDB without defining what type of data it is or designing schema for it is one of the reasons many of our customers choose us. But the idea that MongoDB is a "schemaless" database is not accurate. Although a document database does allow you to store data without defining what it is, the shape of that data matters if you plan to do more than simply retrieve whole documents by keys.

The Need to Model for NoSQL

At this year's MongoDB World event, Daniel Coupal, staff developer advocate at MongoDB, explained that the need to model data in a document database is due to the presence of constraints that must be taken into account when you're persisting data in MongoDB. Constraints include things like network and hard disc speed, maximum file size of documents, and features that you don't have now but might add later.

"If you look at the stack of an application, you have the application that talks to MongoDB that talks to whatever layer you add — it could be the cloud or a physical machine — those constraints are going to map to some of those layers," Coupal said. "It's imperative to know the features of the products you use." MongoDB offers features like transactions, field-level encryption, data federation, and archives, which require that you model data differently.

In relational databases, data modeling is fairly straightforward due to the nature of third normal form (3NF) — the database schema design approach for relational databases. So, essentially, there’s only one solution for modeling the database. With the document model and MongoDB, however, you have several options for data modeling. You can nest everything under a single collection, and possibly wind up with duplicate sets of data (and, therefore, data concurrency issues), or you can use separate collections for different datasets and avoid duplicate data altogether. Ultimately, according to Coupal, "the optimal grouping of objects into collections is determined by the workload."

During the session, Coupal provided a breakdown of data modeling methodology that involves a three-phase process starting with the workload, proceeding to relationships, and moving to patterns for optimization purposes. "In a lot of the solutions we're trying to build with NoSQL, performance is a top requirement," he said. He also cited better performance as one of the big reasons why people switch from SQL to MongoDB.

What are the Data Access Patterns?

In essence, with MongoDB, the way you plan to access the data determines the way you store it in the database. Data that is accessed together should be stored together.

In the session, Coupal also presented an insightful analogy between the nature of data modeling in relational databases versus the document model in MongoDB. Essentially, the difference is that with the relational model, if you have a car and you want to model it, you'll take each part of the car individually and place it in its own table. Then, when you want to use the car, you have to reassemble it part by part (and table by table) before you can drive it. With MongoDB, you take the car and put it in the garage (the equivalent of a collection). When you want to use it, you take it out. That's it. "We do only one read on the disc to get everything we need together," Coupal said.

Techniques for Data Modeling in MongoDB

Coupal also provided an explanation of two different data modeling techniques: referencing and embedding. Embedding is a way to combine what would normally be two tables in a tabular database into one using an array. "The array is the expression of the one-to-many relationship," Coupal said. Referencing is useful for when the "many" side of the relationship is a huge number. Although MongoDB does support transactions, in almost all cases, it's better to use a document for more efficient read-write performance.

As we know, developers are the ones who are most likely to understand the data access patterns for their applications. Properly designed schemas can increase performance for a given set of hardware by reducing computation, I/O, and contention. What really differentiates MongoDB from relational databases is the ability to co-locate related data in the atomic unit of storage so multiple values for an attribute can exist within a single record rather than being broken up into rows and stored independently. A document database with a properly designed schema lets you filter and retrieve data with minimal computational overhead and in a single I/O operation. This approach can make finding and retrieving data far faster and less expensive.

To see the complete session from MongoDB World 2022, which includes a list of 12 data modeling patterns and techniques for evolving schema in MongoDB, watch The Principles of Data Modeling for MongoDB.

← Previous

Introducing Pay-As-You-Go MongoDB Atlas on Azure Marketplace

MongoDB was an official sponsor at the recent two-day, jam-packed 2022 Microsoft Ignite event. The centralized theme was “How to empower the customer to do more with less” in the Microsoft Cloud. The interactive conference created a meeting space for professionals to connect in-person with subject matter experts to discuss current and future points of digital transformation, attend workshops, learn key announcements, and discover innovative new offerings. Microsoft officially announced MongoDB to be part of a set of companies that make up the new Microsoft Intelligent Data Platform Partner Ecosystem and we are pleased to highlight our expanded alliance. Our partnership provides a frictionless process for developers to access MongoDB Atlas , the leading multi-cloud developer data platform available on the Microsoft Azure Marketplace . By procuring Atlas through the Azure Marketplace, customers can access a streamlined procurement and billing experience and use their Azure accounts to pay for their Atlas usage. MongoDB is also offering a free trial of the Atlas database through the Azure Marketplace. With the new Pay-As-You-Go Atlas listing on the Azure Marketplace, you only pay for the Atlas resources you use, with no upfront commitment required. You will receive just one monthly invoice on your Azure account that includes your Atlas usage, and you can apply existing Azure committed spend to it. Read the Azure Marketplace documentation to learn how to take advantage of the Microsoft Azure consumption commitment (MACC) and Azure commit to consume (CtC). You can even start free with an M0 Atlas cluster and scale up as needed. A free Atlas cluster comes with 512 MB of storage, out-of-the-box security features, and a basic support plan. If you’d like to upgrade your support plan, you can select one in Atlas and the additional cost will also be billed through Azure. MongoDB offers several support subscriptions with varying SLAs and levels of technical support. Whether you’re a new or existing Atlas customer, you can subscribe to Atlas directly from the Azure Marketplace. After you subscribe, you’ll be prompted to log in or create a new Atlas account. You can then deploy a new Atlas cluster or link your existing cluster(s) to your Azure account. Atlas customers can take advantage of best-in-class database features including: Production-grade security features, such as always-on authentication, network isolation, end-to-end encryption, and role-based access controls to keep your data protected. Global, high availability. Clusters are fault-tolerant and self-healing by default. Deploy across multiple regions for even better guarantees and low-latency local reads. Support for any class of workload. Build full-text search, run real-time analytics, share visualizations, and sync to the edge with fully integrated and native Atlas data services that require no manual data replication or additional infrastructure. New integrations that empower builders, developers, and digital natives to unlock the power of MongoDB Atlas when running on Azure—including PowerApps, PowerAutomate, PowerBI, Synapse, and Purview—to seamlessly add Atlas to existing architectures. With MongoDB Atlas on Microsoft Azure, developers receive access to the most comprehensive, secure, scalable, and cloud–based developer data platform in the market. Now, with the availability of Atlas on the Azure Marketplace, it’s never been easier for users to start building with Atlas while streamlining procurement and billing processes. Get started today through the Atlas on Azure Marketplace listing.

October 19, 2022

Next →

Announcing DirectQuery Support for the MongoDB Atlas Connector for Power BI

Last year, we introduced the MongoDB Atlas Power BI Connector , a certified solution that has transformed how businesses gain real-time insights from their MongoDB Atlas data using their familiar Microsoft Power BI interface. Today, we’re excited to announce a significant enhancement to this integration: the introduction of DirectQuery support. DirectQuery mode provides a direct connection to your MongoDB Atlas database, allowing Power BI to query data in real-time. This means that your Power BI visualizations and reports will always reflect the latest data without importing and storing data within Power BI. This is especially beneficial for analyzing large datasets where up-to-date information is crucial, ensuring decisions are made efficiently without losing performance due to repetitive data imports and storage complexities. How DirectQuery in MongoDB Atlas Power BI Connector works: The Power BI Connector is supported through MongoDB’s Atlas SQL Interface , which is easily enabled from the Atlas console. Atlas SQL, powered by Atlas Data Federation , allows you to integrate data across sources and apply transformations directly, enhancing your analytics. Once enabled, you’ll receive a SQL Endpoint or URL to input into your MongoDB Atlas SQL Connection Dialog within Power BI Desktop. Here, you can choose between two connectivity modes: Import or DirectQuery. Once connected through DirectQuery, Query folding takes place with Power Query , which is how data retrieval and transformation of source data is optimized. You can also achieve data transformation using a SQL Statement, either with the SQL Statement option in the Atlas SQL Interface or within the M Code script accessed via the Power Query Advanced Editor. After your data is transformed and ready for analysis, start building reports with your Atlas data within the Power BI Desktop! Then, simply save, publish, and distribute within the Power BI online app, which is now part of the Microsoft Fabric platform. Watch our comprehensive tutorial below covering how to connect your Atlas data to Power BI , control SQL schemas in Atlas, and use DirectQuery to gain real-time access to your data for business insights. Power BI Connector for MongoDB Atlas is a Microsoft-certified solution. It not only supports the advanced capabilities of DirectQuery but also continues to offer Import Mode for scenarios where data volume is manageable and detailed data modeling is preferred. Whether you’re analyzing real-time data streams or creating comprehensive reports, the Atlas Power BI Connector adapts to your needs, ensuring your business leverages the full power of MongoDB Atlas. DirectQuery Support is available now and can be accessed by updating your existing MongoDB Atlas Power BI Connector or downloading it here . Start transforming your data analysis and making more informed decisions with real-time Atlas data. Log in and activate the Atlas SQL Interface to try out the Atlas Power BI Connector ! If you are new to Atlas or Power BI, get started for free today on Azure Marketplace or Power BI Desktop .

May 13, 2024