Paginations 2.0: Why I Would Choose MongoDB
Rate this article
I've been writing and designing large scale, multi-user, applications with database backends since 1995, as lead architect for intelligence management systems, text mining, and analytics platforms, and as a consultant working in retail and investment banking, mobile games, connected-car IoT projects, and country scale document management. It's fair to say I've seen how a lot of applications are put together.
Now it's also reasonable to assume that as I work for MongoDB, I have some bias, but MongoDB isn't my first database, or even my first document database, and so I do have a fairly broad perspective. I'd like to share with you three features of MongoDB that would make it my first choice for almost all large, multi-user database applications.
The Document model is a fundamental aspect of MongoDB. All databases store records—information about things that have named attributes and values for those attributes. Some attributes might have multiple values. In a tabular database, we break the record into multiple rows with a single scalar value for each attribute and have a way to relate those rows together to access the record.
The difference in a Document database is when we have multiple values for an attribute, we can retain those as part of a single record, storing access and manipulating them together. We can also group attributes together to compare and refer to them as a group. For example, all the parts of an address can be accessed as a single address field or individually.
Why does this matter? Well, being able to store an entire record co-located on disk and in memory has some huge advantages.
By having these larger, atomic objects to work with, there are some oft quoted benefits like making it easier for OO developers and reducing the computational overheads of accessing the whole record, but this misses a third, even more important benefit.
With the correct schema, documents reduce each database write operation to single atomic changes of one piece of data. This has two huge and related benefits.
By only requiring one piece of data to be examined for its current state and changed to a new state at a time, the period of time where the database state is unresolved is reduced to almost nothing. Effectively, there is no interaction between multiple writes to the database and none have to wait for another to complete, at least not beyond a single change to a single document.
If we have to use traditional transactions, whether in an RDBMS or MongoDB, to perform a change then all records concerned remain effectively locked until the transaction is complete. This greatly widens the window for contention and delay. Using the document model instead, you can remove all contention in your database and achieve far higher 'transactional' throughput in a multi-user system.
The second part of this is that when each write to the database can be treated as an independent operation, it makes it easy to horizontally scale the database to support large workloads as the state of a document on one server has no impact on your ability to change a document on another. Every operation can be parallelised.
Doing this does require you to design your schema correctly, though. Document databases are far from schemaless (a term MongoDB has not used for many years). In truth, it makes schema design even more important than in an RDBMS.
The second reason I would choose to use MongoDB is that high-availability is at the heart of the database. MongoDB is designed so that a server can be taken offline instantly, at any time and there is no loss of service or data. This is absolutely fundamental to how all of MongoDB is designed. It doesn't rely on specialist hardware, third-party software, or add-ons. It allows for replacement of servers, operating systems, and even database versions invisibly to the end user, and even mostly to the developer. This goes equally for Atlas, where MongoDB can provide a multi-cloud database service at any scale that is resilient to the loss of an entire cloud provider, whether it’s Azure, Google, or Amazon. This level of uptime is unprecedented.
So, if I plan to develop a large, multi-user application I just want to know the database will always be there, zero downtime, zero data loss, end of story.
The third reason I would choose MongoDB is possibly the most surprising. Not all document databases are the same, and allow you to realise all the benefits of a document versus relational model, some are simply JSON stores or Key/Value stores where the value is some form of document.
MongoDB has the powerful, specialised update operators capable of doing more than simply replacing a document or a value in the database. With MongoDB, you can, as part of a single atomic operation, verify the state of values in the document, compute the new value for any field based on it and any other fields, sort and truncate arrays when adding to them and, should you require it automatically, create a new document rather than modify an existing one.
It is this "smart" update capability that makes MongoDB capable of being a principal, "transactional" database in large, multi-user systems versus a simple store of document shaped data.
These three features, at the heart of an end-to-end data platform, are what genuinely make MongoDB my personal first choice when I want to build a system to support many users with a snappy user experience, 24 hours a day, 365 days a year.