Traditional relational databases are well-defined, using a schema to describe every functional element, including tables, rows views, indexes, and relationships. By exerting a high degree of control, the database administrator can improve performance and prevent capture of low-quality, incomplete, or malformed data. In a SQL database, the schema is enforced by the Relational Database Management System (RDBMS) whenever data is written to disk.
But in order to work, data needs to be heavily formatted and shaped to fit into the table structure. This means sacrificing any undefined details during the save, or storing valuable information outside the database entirely.
A schemaless database, like MongoDB, does not have these up-front constraints, mapping to a more ‘natural’ database. Even when sitting on top of a data lake, each document is created with a partial schema to aid retrieval. Any formal schema is applied in the code of your applications; this layer of abstraction protects the raw data in the NoSQL database and allows for rapid transformation as your needs change.
Any data, formatted or not, can be stored in a non-tabular NoSQL type of database. At the same time, using the right tools in the form of a schemaless database can unlock the value of all of your structured and unstructured data types.
In schemaless databases, information is stored in JSON-style documents which can have varying sets of fields with different data types for each field. So, a collection could look like this:
{
name : “Joe”, age : 30, interests : ‘football’ }
{
name : “Kate”, age : 25
}
As you can see, the data itself normally has a fairly consistent structure. With the schemaless MongoDB database, there is some additional structure — the system namespace contains an explicit list of collections and indexes. Collections may be implicitly or explicitly created — indexes must be explicitly declared.
Greater flexibility over data types
By operating without a schema, schemaless databases can store, retrieve, and query any data type — perfect for big data analytics and similar operations that are powered by unstructured data. Relational databases apply rigid schema rules to data, limiting what can be stored.
No pre-defined database schemas
The lack of schema means that your NoSQL database can accept any data type — including those that you do not yet use. This future-proofs your database, allowing it to grow and change as your data-driven operations change and mature.
No data truncation
A schemaless database makes almost no changes to your data; each item is saved in its own document with a partial schema, leaving the raw information untouched. This means that every detail is always available and nothing is stripped to match the current schema. This is particularly valuable if your analytics needs to change at some point in the future.
Suitable for real-time analytics functions
With the ability to process unstructured data, applications built on NoSQL databases are better able to process real-time data, such as readings and measurements from IoT sensors. Schemaless databases are also ideal for use with machine learning and artificial intelligence operations, helping to accelerate automated actions in your business.
Enhanced scalability and flexibility
With NoSQL, you can use whichever data model is best suited to the job. Graph databases allow you to view relationships between data points, or you can use traditional wide table views with an exceptionally large number of columns. You can query, report, and model information however you choose. And as your requirements grow, you can keep adding nodes to increase capacity and power.
When a record is saved to a relational database, anything (particularly metadata) that does not match the schema is truncated or removed. Deleted at write, these details cannot be recovered at a later point in time.
What does this look like?
A lack of rigid schema allows for increased transparency and automation when making changes to the database or performing a data migration. Say you want to add GPA attributes to student objects held in your database. You simply add the attribute, resave, and the GPA value has been added to the NoSQL document. If you look up an existing student and reference GPA, it will return null. If you roll back your code, the new GPA fields in the existing objects are unlikely to cause problems and do not need to be removed if your code is well written.
A NoSQL database is very different to a traditional relational database which has a strictly defined schema enforced by the RDBMS. However, in order to assist with sorting and retrieval, each NoSQL document contains a partial schema — all collections and indexes are explicitly listed in the system namespace for instance. However, a schema is only applied to your data when it is retrieved by your application.
Because NoSQL databases are designed to store and query unstructured data, they do not require the same rigid schemas used by relational databases. Although a schema can be applied at the application level, NoSQL databases retain all of your unstructured data in its original raw format. This means that complete granularity is retained, even if you later change your application schema — something that is simply not possible with a traditional SQL database.
As a NoSQL database, MongoDB is considered schemaless because it does not require a rigid, pre-defined schema like a relational database. The database management system (DBMS) enforces a partial schema as data is written, explicitly listing collections and indexes. The applications you use to leverage data stored in MongoDB will enforce a much stricter dynamically typed schema as documents are read from the database.
Schemaless databases provide the flexibility your business needs to extract maximum value and insights from its data estate. By not applying structure at the point of saving, you prevent details that may become important in the future from being lost.
Test the power of a schemaless database for yourself with a free tier MongoDB Atlas subscription. There’s no credit card needed to get started and no technical limitations, so you can get started right away.