JSON Schema Validation and Expressive Query Syntax in MongoDB 3.6
One of MongoDB’s key strengths has always been developer empowerment: by relying on a flexible schema architecture, MongoDB makes it easier and faster for applications to move through the development stages from proof-of-concept to production and iterate over update cycles as requirements evolve.
However, as applications mature and scale, they tend to reach a stable stage where frequent schema changes are no longer critical or must be rolled out in a more controlled fashion, to prevent undesirable data from being inserted into the database. These controls are especially important when multiple applications write into the same database, or when analytics processes rely on predefined data structures to be accurate and useful.
MongoDB 3.2 was the first release to introduce
Document Validation
, one of the features that developers and DBAs who are accustomed to relational databases kept demanding. As MongoDB’s CTO, Eliot Horowitz, highlighted in
Document Validation and What Dynamic Schemas means
:
Along with the rest of the 3.2 "schema when you need it" features, document validation gives MongoDB a new, powerful way to keep data clean
.
These are definitely not the final set of tools we will provide, but is rather an important step in how MongoDB handles schema
.
Announcing JSON Schema Validation support
Building upon MongoDB 3.2’s Document Validation functionality,
MongoDB 3.6
introduces a more powerful way of enforcing schemas in the database, with its support of JSON Schema Validation, a specification which is part of IETF’s emerging
JSON Schema
standard.
JSON Schema Validation extends Document Validation in many different ways, including the ability to enforce schemas inside arrays and prevent unapproved attributes from being added. These are the new features we will focus on in this blog post, as well as the ability to build business validation rules.
Starting with MongoDB 3.6, JSON Schema is the recommended way of enforcing
Schema Validation
. The next section highlights the features and benefits of using JSON Schema Validation.
Switching from Document Validation to JSON Schema Validation
We will start by creating an
orders
collection (based on an example we published in the
Document Validation tutorial blog post
):
db.createCollection("orders", {
validator: {
item: { $type: "string" },
price: { $type: "decimal" }
}
});
With this document validation configuration, we not only make sure that both the
item
and
price
attributes are present in any
order
document, but also that
item
is a string and
price
a decimal (which is the recommended type for all currency and percentage values). Therefore, the following element cannot be inserted (because of the "rogue" *price *attribute):
db.orders.insert({
"_id": 6666,
"item": "jkl",
"price": "rogue",
"quantity": 1 });
However, the following document could be inserted (notice the misspelled "pryce" attribute):
db.orders.insert({
"_id": 6667,
"item": "jkl",
"price": NumberDecimal("15.5"),
"pryce": "rogue" });
Prior to MongoDB 3.6, you could not prevent the addition of misspelled or unauthorized attributes. Let’s see how JSON Schema Validation can prevent this behavior. To do so, we will use a new operator,
$jsonSchema
:
db.runCommand({
collMod: "orders",
validator: {
$jsonSchema: {
bsonType: "object",
required: ["item", "price"],
properties: {
item: {
bsonType: "string"
},
price: {
bsonType: "decimal"
}
}
}
}
});
The JSON Schema above is the exact equivalent of the document validation rule we previously set above on the
orders
collection. Let’s check that our schema has indeed been updated to use the new
$jsonSchema
operator by using the
db.getCollectionInfos()
method in the Mongo shell:
db.getCollectionInfos({name:"orders"})
This command prints out a wealth of information about the
orders
collection. For the sake of readability, below is the section that includes the JSON Schema:
...
"options" : {
"validator" : {
"$jsonSchema" : {
"bsonType" : "object",
"required" : [
"item",
"price"
],
"properties" : {
"item" : {
"bsonType" : "string"
},
"price" : {
"bsonType" : "decimal"
}
}
}
},
"validationLevel" : "strict",
"validationAction" : "error"
}
...
Now, let’s enrich our JSON schema a bit to make better use of its powerful features:
db.runCommand({
collMod: "orders",
validator: {
$jsonSchema: {
bsonType: "object",
additionalProperties: false
,
required: ["item", "price"],
properties: {
_id: {}
,
item: {
bsonType: "string",
description: "'item' must be a string and is required"
},
price: {
bsonType: "decimal",
description: "'price' must be a decimal and is required"
},
quantity: {
bsonType: ["int", "long"]
,
minimum: 1,
maximum: 100,
exclusiveMaximum: true,
description:
"'quantity' must be short or long integer between 1 and 99"
}
}
}
}
});
Let’s go through the additions we made to our schema:
First, note the use of the
additionalProperties:false
attribute: it prevents us from adding any attribute other than those mentioned in the
properties
section. For example, it will no longer be possible to insert data containing a misspelled
pryce
attribute. As a result, the use of
additionalProperties:false
at the root level of the document also makes the declaration of the
_id
property mandatory: whether our insert code explicitly sets it or not, it is a field MongoDB requires and would automatically create, if not present. Thus, we must include it explicitly in the
properties
section of our schema.
Second, we have chosen to declare the
quantity
attribute as either a short or long integer between 1 and 99 (using the
minimum
,
maximum
and
exclusiveMaximum
attributes). Of course, because our schema only allows integers lower than 100, we could simply have set the bsonType property to
int
. But adding
long
as a valid type makes application code more flexible, especially if there might be plans to lift the
maximum
restriction.
Finally, note that the
description
attribute (present in the
item
,
price
, and
quantity
attribute declarations) is entirely optional and has no effect on the schema aside from documenting the schema for the reader.
With the schema above, the following documents can be inserted into our orders collection:
db.orders.insert({
"item": "jkl",
"price": NumberDecimal(15.50),
"quantity": NumberInt(99)
});
db.orders.insert({
"item": "jklm",
"price": NumberDecimal(15.50),
"quantity": NumberLong(99)
});
However, the following documents are no longer considered valid:
db.orders.insert({
"item": "jkl",
"price": NumberDecimal(15.50),
"quantity": NumberInt(100)
});
db.orders.insert({
"item": "jkl",
"price": NumberDecimal(15.50),
"quantity": "98"
});
db.orders.insert({
"item": "jkl",
"pryce": NumberDecimal(15.50),
"quantity": NumberInt(99)
});
You probably noticed that our orders above are seemingly odd: they only contain one single item. More realistically, an order consists of multiple items and a possible JSON structure might be as follows:
{
_id: 10000,
total: NumberDecimal(141),
VAT: 0.20,
totalWithVAT: NumberDecimal(169),
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
unit_price:NumberDecimal(9)
},
{
sku: "MDBTS002",
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
}
With MongoDB 3.6, we can now control the structure of the
lineitems
array, for instance with the following JSON Schema:
db.runCommand({
collMod: "orders",
validator: {
$jsonSchema: {
bsonType: "object",
required: ["lineitems"],
properties: {
lineitems: {
bsonType: ["array"],
minItems: 1,
maxItems:10,
items: {
required: ["unit_price", "sku", "quantity"],
bsonType: "object",
additionalProperties: false,
properties: {
sku: {
bsonType: "string",
description: "'sku' must be a string and is required"
},
name: {
bsonType: "string",
description: "'name' must be a string"
},
unit_price: {
bsonType: "decimal",
description: "'unit_price' must be a decimal and is required"
},
quantity: {
bsonType: ["int", "long"],
minimum: 0,
maximum: 100,
exclusiveMaximum: true,
description:
"'quantity' must be a short or long integer in [0, 100)"
},
}
}
}
}
}
}
});
With the schema above, we enforce that any order inserted or updated in the orders collection contain a
lineitems
array of 1 to 10 documents that all have
sku
,
unit_price
and
quantity
attributes (with quantity required to be an integer).
The schema would prevent inserting the following, badly formed document:
db.orders.insert({
total: NumberDecimal(141),
VAT: NumberDecimal(0.20),
totalWithVAT: NumberDecimal(169),
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
price: NumberDecimal(9) //this should be 'unit_price'
},
{
name: "MDBTS002", //missing a 'sku' property
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
})
But it would allow inserting the following, schema-compliant document:
db.orders.insert({
total: NumberDecimal(141),
VAT: NumberDecimal(0.20),
totalWithVAT: NumberDecimal(169),
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
unit_price: NumberDecimal(9)
},
{
sku: "MDBTS002",
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
})
However, if you pay close attention to the order above, you may notice that it contains a few errors:
The
totalWithVAT
attribute value is incorrect (it should be equal to 141*1.20=169.2)
The
total
attribute value is incorrect (it should be equal to the sum of each line item sub-total, (i.e. 10*9+10*5=140)
Is there any way to enforce that
total
and
totalWithVAT
values be correct using database validation rules, without relying solely on application logic?
Introducing MongoDB expressive query syntax
Adding more complex business validation rules is now possible thanks to the expressive query syntax, a new feature of MongoDB 3.6.
One of the objectives of the expressive query syntax is to bring the power of MongoDB’s
aggregation expressions
to MongoDB’s
query language
. An interesting use case is the ability to compose dynamic validation rules that compute and compare multiple attribute values at runtime. Using the new
$expr
operator, it is possible to validate the value of the
totalWithVAT
attribute with the following validation expression:
$expr: {
$eq: [
"$totalWithVAT",
{$multiply: [
"$total",
{$sum: [1, "$VAT"]}
]}
]
}
The above expression checks that the
totalWithVAT
attribute value is equal to
total * (1+VAT)
. In its compact form, here is how we could use it as a validation rule, alongside our JSON Schema validation:
db.runCommand({
collMod: "orders",
validator: {
$expr:{$eq:[
"$totalWithVAT",
{$multiply:["$total", {$sum:[1,"$VAT"]}]}
]}
,
$jsonSchema: {
bsonType: "object",
required: ["lineitems"],
properties: {
lineitems: {
bsonType: ["array"],
minItems: 1,
maxItems:10,
items: {
required: ["unit_price", "sku", "quantity"],
bsonType: "object",
additionalProperties: false,
properties: {
sku: {
bsonType: "string",
description: "'sku' must be a string and is required"
},
name: {
bsonType: "string",
description: "'name' must be a string"
},
unit_price: {
bsonType: "decimal",
description: "'unit_price' must be a decimal and is required"
},
quantity: {
bsonType: ["int", "long"],
minimum: 0,
maximum: 100,
exclusiveMaximum: true,
description:
"'quantity' must be a short or long integer in [0, 100)"
},
}
}
}
}
}
}
});
With the validator above, the following insert operation is no longer possible:
db.orders.insert({
total: NumberDecimal(141),
VAT: NumberDecimal(0.20),
totalWithVAT: NumberDecimal(169),
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
Unit_price: NumberDecimal(9)
},
{
sku: "MDBTS002",
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
})
Instead, the
totalWithVAT
value must be adjusted according to our new VAT validation rule:
db.orders.insert({
total: NumberDecimal(141),
VAT: NumberDecimal(0.20),
totalWithVAT: NumberDecimal(169.2)
,
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
unit_price: NumberDecimal(9)
},
{
sku: "MDBTS002",
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
})
If we also want to make sure that the
total
value is the sum of each order line item value (i.e.
quantity*unit_price
), the following expression should be used:
$expr: {
$eq: [
"$total",
{$sum: {
$map: {
"input": "$lineitems",
"as": "item",
"in": {
"$multiply": [
"$$item.quantity",
"$$item.unit_price"
]
}
}
}}
]
}
The above expression uses the
$map
operator to compute each line item’s sub-total, then
sums
all these sub-totals, and finally
compares
it to the
total
value. To make sure that both the Total and VAT validation rules are checked, we must combine them using the
$and
operator. Finally, our collection validator can be updated with the following command:
db.runCommand({
collMod: "orders",
validator: {
$expr:{ $and:[
{$eq:[
"$totalWithVAT",
{$multiply:["$total", {$sum:[1,"$VAT"]}]}
]},
{$eq: [
"$total",
{$sum: {$map: {
"input": "$lineitems",
"as": "item",
"in":{"$multiply":["$$item.quantity","$$item.unit_price"]}
}}}
]}
]},
$jsonSchema: {
bsonType: "object",
required: ["lineitems", "total", "VAT", "totalWithVAT"],
properties: {
total: { bsonType: "decimal" },
VAT: { bsonType: "decimal" },
totalWithVAT: { bsonType: "decimal" },
lineitems: {
bsonType: ["array"],
minItems: 1,
maxItems:10,
items: {
required: ["unit_price", "sku", "quantity"],
bsonType: "object",
additionalProperties: false,
properties: {
sku: {bsonType: "string"},
name: {bsonType: "string"},
unit_price: {bsonType: "decimal"},
quantity: {
bsonType: ["int", "long"],
minimum: 0,
maximum: 100,
exclusiveMaximum: true
},
}
}
}
}
}
}
});
Accordingly, we must update the
total
and
totalWithVAT
properties to comply with our updated schema and business validation rules (without changing the
lineitems
array):
db.orders.insert({
total: NumberDecimal(140),
VAT: NumberDecimal(0.20),
totalWithVAT: NumberDecimal(168),
lineitems: [
{
sku: "MDBTS001",
name: "MongoDB Stitch T-shirt",
quantity: NumberInt(10),
unit_price: NumberDecimal(9)
},
{
sku: "MDBTS002",
quantity: NumberInt(5),
unit_price: NumberDecimal(10)
}
]
})
Next steps
With the introduction of JSON Schema Validation in MongoDB 3.6, database administrators are now better equipped to address data governance requirements coming from compliance officers or regulators, while still benefiting from MongoDB’s flexible schema architecture.
Additionally, developers will find the new expressive query syntax useful to keep their application code base simpler by moving business logic from the application layer to the database layer.
If you want to learn more about everything new in MongoDB 3.6, download our
What’s New guide
.
If you want to get deeper on the technical side, visit the
Schema Validation
and
Expressive Query Syntax
pages in our official documentation.
If you want to get more practical, hands-on experience, take a look at this
JSON Schema Validation hands-on lab
. You can try it right away on
MongoDB Atlas
, which supports MongoDB 3.6 since its general availability date.
Last but not least, sign up for our
free MongoDB 3.6 training
from MongoDB University.
About the Author -
Raphael Londner
Raphael Londner is a Principal Developer Advocate at MongoDB, focused on cloud technologies such as Amazon Web Services, Microsoft Azure and Google Cloud Engine. Previously he was a developer advocate at Okta as well as a startup entrepreneur in the identity management space. You can follow him on Twitter at
@rlondner
December 13, 2017