Improved Error Messages for Schema Validation in MongoDB 5.0
Rate this announcement
Many MongoDB users rely on schema
validation to
enforce rules governing the structure and integrity of documents in
their collections. But one of the challenges they faced was quickly
understanding why a document that did not match the schema couldn't be
inserted or updated. This is changing in the upcoming MongoDB 5.0
release.
Schema validation ease-of-use will be significantly improved by
generating descriptive error messages whenever an operation fails
validation. This additional information provides valuable insight into
which parts of a document in an insert/update operation failed to
validate against which parts of a collection's validator, and how. From
this information, you can quickly identify and remediate code errors
that are causing documents to not comply with your validation rules. No
more tedious debugging by slicing your document into pieces to isolate
the problem!
If you would like to evaluate this feature and provide us early
feedback, fill in this
form to
participate in the preview program.
The most popular way to express the validation rules is JSON
Schema.
It is a widely adopted standard that is also used within the REST API
specification and validation. And in MongoDB, you can combine JSON
Schema with the MongoDB Query Language (MQL) to do even more.
In this post, I would like to go over a few examples to reiterate the
capabilities of schema validation and showcase the addition of new
detailed error messages.
First, let's look at the new error message. It is a structured message
in the BSON format, explaining which part of the document didn't match
the rules and which validation rule caused this.
Consider this basic validator that ensures that the price field does not
accept negative values. In JSON Schema, the property is the equivalent
of what we call "field" in MongoDB.
1 { 2 "$jsonSchema": { 3 "properties": { 4 "price": { 5 "minimum": 0 6 } 7 } 8 } 9 }
When trying to insert a document with
{price: -2}
, the following error
message will be returned.1 { 2 "code": 121, 3 "errmsg": "Document failed validation", 4 "errInfo": { 5 "failingDocumentId": ObjectId("5fe0eb9642c10f01eeca66a9"), 6 "details": { 7 "operatorName": "$jsonSchema", 8 "schemaRulesNotSatisfied": [ 9 { 10 "operatorName": "properties", 11 "propertiesNotSatisfied": [ 12 { 13 "propertyName": "price", 14 "details": [ 15 { 16 "operatorName": "minimum", 17 "specifiedAs": { 18 "minimum": 0 19 }, 20 "reason": "comparison failed", 21 "consideredValue": -2 22 } 23 ] 24 } 25 ] 26 } 27 ] 28 } 29 } 30 }
Some of the key fields in the response are:
failingDocumentId
- the _id of the document that was evaluatedoperatorName
- the operator used in the validation rulepropertiesNotSatisfied
- the list of fields (properties) that failed validation checkspropertyName
- the field of the document that was evaluatedspecifiedAs
- the rule as it was expressed in the validatorreason - explanation
of how the rule was not satisfiedconsideredValue
- value of the field in the document that was evaluated
The error may include more fields depending on the specific validation
rule, but these are the most common. You will likely find the
propertyName
and reason
to be the most useful fields in the
response.Now we can look at the examples of the different validation rules and
see how the new detailed message helps us identify the reason for the
validation failure.
As an example, we'll use a collection of real estate properties in NYC
managed by a team of real estate agents.
Here is a sample document:
1 { 2 "PID": "EV10010A1", 3 "agents": [ { "name": "Ana Blake", "email": "anab@rcgk.com" } ], 4 "description": "Spacious 2BR apartment", 5 "localization": { "description_es": "Espacioso apartamento de 2 dormitorios" }, 6 "type": "Residential", 7 "address": { 8 "street1": "235 E 22nd St", 9 "street2": "Apt 42", 10 "city": "New York", 11 "state": "NY", 12 "zip": "10010" 13 }, 14 "originalPrice": 990000, 15 "discountedPrice": 980000, 16 "geoLocation": [ -73.9826509, 40.737499 ], 17 "listedDate": "Wed Dec 11 2020 10:05:10 GMT-0500 (EST)", 18 "saleDate": "Wed Dec 21 2020 12:00:04 GMT-0500 (EST)", 19 "saleDetails": { 20 "price": 970000, 21 "buyer": { "id": "24434" }, 22 "bids": [ 23 { 24 "price": 950000, 25 "winner": false, 26 "bidder": { 27 "id": "24432", 28 "name": "Sam James", 29 "contact": { "email": "sjames@gmail.com" } 30 } 31 }, 32 { 33 "price": 970000, 34 "winner": true, 35 "bidder": { 36 "id": "24434", 37 "name": "Joana Miles", 38 "contact": { "email": "jm@gmail.com" } 39 } 40 } 41 ] 42 } 43 }
Our real estate properties are identified with property id (PID) that
has to follow a specific naming format: It should start with two letters
followed by five digits, and some letters and digits after, like this:
WS10011FG4 or EV10010A1.
We can use JSON Schema
pattern
operator to create a rule for this as a
regular expression.Validator:
1 { 2 "$jsonSchema": { 3 "properties": { 4 "PID": { 5 "bsonType": "string", 6 "pattern": "^[A-Z]{2}[0-9]{5}[A-Z]+[0-9]+$" 7 } 8 } 9 } 10 }
If we try to insert a document with a PID field that doesn't match the
pattern, for example
{ PID: "apt1" }
, we will receive an error.The error states that the field
PID
had the value of "apt1"
and it
did not match the regular expression, which was specified as
"^[A-Z]{2}[0-9]{5}[A-Z]+[0-9]+$"
.1 { ... 2 "schemaRulesNotSatisfied": [ 3 { 4 "operatorName": "properties", 5 "propertiesNotSatisfied": [ 6 { 7 "propertyName": "PID", 8 "details": [ 9 { 10 "operatorName": "pattern", 11 "specifiedAs": { 12 "pattern": "^[A-Z]{2}[0-9]{5}[A-Z]+[0-9]+$" 13 }, 14 "reason": "regular expression did not match", 15 "consideredValue": "apt1" 16 } 17 ] 18 } 19 ] 20 ... 21 }
The description may be localized into several languages. Currently, our
application only supports Spanish, German, and French, so the
localization object can only contain fields
description_es
,
description_de
, or description_fr
. Other fields will not be allowed.We can use operator
patternProperties
to describe this requirement as
regular expression and indicate that no other fields are expected here
with "additionalProperties": false
.Validator:
1 { 2 "$jsonSchema": { 3 "properties": { 4 "PID": {...}, 5 "localization": { 6 "additionalProperties": false, 7 "patternProperties": { 8 "^description_(es|de|fr)+$": { 9 "bsonType": "string" 10 } 11 } 12 } 13 } 14 } 15 }
Document like this can be inserted successfully:
1 { 2 "PID": "TS10018A1", 3 "type": "Residential", 4 "localization": { 5 "description_es": "Amplio apartamento de 2 dormitorios", 6 "description_de": "Geräumige 2-Zimmer-Wohnung", 7 } 8 }
Document like this will fail the validation check:
1 { 2 "PID": "TS10018A1", 3 "type": "Residential", 4 "localization": { 5 "description_cz": "Prostorný byt 2 + kk" 6 } 7 }
The error below indicates that field
localization
contains additional
property description_cz
. description_cz
does not match the expected
pattern, so it is considered an additional property.1 { ... 2 "propertiesNotSatisfied": [ 3 { 4 "propertyName": "localization", 5 "details": [ 6 { 7 "operatorName": "additionalProperties", 8 "specifiedAs": { 9 "additionalProperties": false 10 }, 11 "additionalProperties": [ 12 "description_cz" 13 ] 14 } 15 ] 16 } 17 ] 18 ... 19 }
Each real estate property in our collection has a type, and we want to
use one of the four types: "Residential," "Commercial," "Industrial," or
"Land." This can be achieved with the operator
enum
.Validator:
1 { 2 "$jsonSchema": { 3 "properties": { 4 "type": { 5 "enum": [ "Residential", "Commercial", "Industrial", "Land" ] 6 } 7 } 8 } 9 }
The following document will be considered invalid:
1 { 2 "PID": "TS10018A1", "type": "House" 3 }
The error states that field
type
failed validation because "value was
not found in enum."1 {... 2 "propertiesNotSatisfied": [ 3 { 4 "propertyName": "type", 5 "details": [ 6 { 7 "operatorName": "enum", 8 "specifiedAs": { 9 "enum": [ 10 "Residential", 11 "Commercial", 12 "Industrial", 13 "Land" 14 ] 15 }, 16 "reason": "value was not found in enum", 17 "consideredValue": "House" 18 } 19 ] 20 } 21 ] 22 ... 23 }
Agents who manage each real estate property are stored in the
agents
array. Let's make sure there are no duplicate elements in the array, and
no more than three agents are working with the same property. We can use
uniqueItems
and maxItems
for this.1 { 2 "$jsonSchema": { 3 "properties": { 4 "agents": { 5 "bsonType": "array", 6 "uniqueItems": true, 7 "maxItems": 3 8 } 9 } 10 } 11 }
The following document violates both if the validation rules.
1 { 2 "PID": "TS10018A1", 3 "agents": [ 4 { "name": "Ana Blake" }, 5 { "name": "Felix Morin" }, 6 { "name": "Dilan Adams" }, 7 { "name": "Ana Blake" } 8 ] 9 }
The error returns information about failure for two rules: "array did
not match specified length" and "found a duplicate item," and it also
points to what value was a duplicate.
1 { 2 ... 3 "propertiesNotSatisfied": [ 4 { 5 "propertyName": "agents", 6 "details": [ 7 { 8 "operatorName": "maxItems", 9 "specifiedAs": { "maxItems": 3 }, 10 "reason": "array did not match specified length", 11 "consideredValue": [ 12 { "name": "Ana Blake" }, 13 { "name": "Felix Morin" }, 14 { "name": "Dilan Adams" }, 15 { "name": "Ana Blake" } 16 ] 17 }, 18 { 19 "operatorName": "uniqueItems", 20 "specifiedAs": { "uniqueItems": true }, 21 "reason": "found a duplicate item", 22 "consideredValue": [ 23 { "name": "Ana Blake" }, 24 { "name": "Felix Morin" }, 25 { "name": "Dilan Adams" }, 26 { "name": "Ana Blake" } 27 ], 28 "duplicatedValue": { "name": "Ana Blake" } 29 } 30 ] 31 ... 32 }
Now, we want to make sure that there's contact information available for
the agents. We need each agent's name and at least one way to contact
them: phone or email. We will use
required
and anyOf
to create this
rule.Validator:
1 { 2 "$jsonSchema": { 3 "properties": { 4 "agents": { 5 "bsonType": "array", 6 "uniqueItems": true, 7 "maxItems": 3, 8 "items": { 9 "bsonType": "object", 10 "required": [ "name" ], 11 "anyOf": [ { "required": [ "phone" ] }, { "required": [ "email" ] } ] 12 } 13 } 14 } 15 } 16 }
The following document will fail validation:
1 { 2 "PID": "TS10018A1", 3 "agents": [ 4 { "name": "Ana Blake", "email": "anab@rcgk.com" }, 5 { "name": "Felix Morin", "phone": "+12019878749" }, 6 { "name": "Dilan Adams" } 7 ] 8 }
Here the error indicates that the third element of the array
(
"itemIndex": 2
) did not match the rule.1 { 2 ... 3 "propertiesNotSatisfied": [ 4 { 5 "propertyName": "agents", 6 "details": [ 7 { 8 "operatorName": "items", 9 "reason": "At least one item did not match the sub-schema", 10 "itemIndex": 2, 11 "details": [ 12 { 13 "operatorName": "anyOf", 14 "schemasNotSatisfied": [ 15 { 16 "index": 0, 17 "details": [ 18 { 19 "operatorName": "required", 20 "specifiedAs": { "required": [ "phone" ] }, 21 "missingProperties": [ "phone" ] 22 } 23 ] 24 }, 25 { 26 "index": 1, 27 "details": [ 28 { 29 "operatorName": "required", 30 "specifiedAs": { "required": [ "email" ] }, 31 "missingProperties": [ "email" ] 32 } 33 ] 34 } 35 ] 36 } 37 ] 38 } 39 ] 40 } 41 ] 42 ... 43 }
Let's create another rule to ensure that if the document contains the
saleDate
field, saleDetails
is also present, and vice versa: If
there is saleDetails
, then saleDate
also has to exist.1 { 2 "$jsonSchema": { 3 "dependencies": { 4 "saleDate": [ "saleDetails"], 5 "saleDetails": [ "saleDate"] 6 } 7 } 8 }
Now, let's try to insert the document with
saleDate
but with no
saleDetails
:1 { 2 "PID": "TS10018A1", 3 "saleDate": Date("2020-05-01T04:00:00.000Z") 4 }
The error now includes the property with dependency
saleDate
and a
property missing from the dependencies: saleDetails
.1 { 2 ... 3 "details": { 4 "operatorName": "$jsonSchema", 5 "schemaRulesNotSatisfied": [ 6 { 7 "operatorName": "dependencies", 8 "failingDependencies": [ 9 { 10 "conditionalProperty": "saleDate", 11 "missingProperties": [ "saleDetails" ] 12 } 13 ] 14 } 15 ] 16 } 17 ... 18 }
Notice that in JSON Schema, the field
dependencies
is in the root
object, and not inside of the specific property. Therefore in the error
message, the details
object will have a different structure:1 { "operatorName": "dependencies", "failingDependencies": [...]}
In the previous examples, when the JSON Schema rule was inside of the
"properties" object, like this:
1 "$jsonSchema": { "properties": { "price": { "minimum": 0 } } }
the details of the error message contained
"operatorName": "properties"
and a "propertyName"
:1 { "operatorName": "properties", 2 "propertiesNotSatisfied": [ { "propertyName": "...", "details": [] } ] 3 }
You can use MongoDB Query Language (MQL) in your validator right next to
JSON Schema to add richer business logic to your rules.
As one example, you can use
$expr
to add a check for a
discountPrice
to be less than originalPrice
just like this:1 { 2 "$expr": { 3 "$lt": [ "$discountedPrice", "$originalPrice" ] 4 }, 5 "$jsonSchema": {...} 6 }
$expr
resolves to
true
or false
, and allows you to use aggregation
expressions to create sophisticated business rules.For a little more complex example, let's say we keep an array of bids in
the document of each real estate property, and the boolean field
isWinner
indicates if a particular bid is a winning one.Sample document:
1 { 2 "PID": "TS10018A1", 3 "type": "Residential", 4 "saleDetails": { 5 "bids": [ 6 { 7 "price": 500000, 8 "isWinner": false, 9 "bidder": {...} 10 }, 11 { 12 "price": 530000, 13 "isWinner": true, 14 "bidder": {...} 15 } 16 ] 17 } 18 }
Let's make sure that only one of the
bids
array elements can be marked
as the winner. The validator will have an expression where we apply a
filter to the array of bids to only keep the elements with "isWinner":
true, and check the size of the resulting array to be less or equal to
1.Validator:
1 { 2 "$and": [ 3 { 4 "$expr": { 5 "$lte": [ 6 { 7 "$size": { 8 "$filter": { 9 "input": "$saleDetails.bids.isWinner", 10 "cond": "$$this" 11 } 12 } 13 }, 14 1 15 ] 16 } 17 }, 18 { 19 "$expr": {...} 20 }, 21 { 22 "$jsonSchema": {...} 23 } 24 ] 25 }
Let's try to insert the document with few bids having
"isWinner": true
.1 { 2 "PID": "TS10018A1", 3 "type": "Residential", 4 "originalPrice": 600000, 5 "discountedPrice": 550000, 6 "saleDetails": { 7 "bids": [ 8 { "price": 500000, "isWinner": true }, 9 { "price": 530000, "isWinner": true } 10 ] 11 } 12 }
The produced error message will indicate which expression evaluated to
false.
1 { 2 ... 3 "details": { 4 "operatorName": "$expr", 5 "specifiedAs": { 6 "$expr": { 7 "$lte": [ 8 { 9 "$size": { 10 "$filter": { 11 "input": "$saleDetails.bids.isWinner", 12 "cond": "$$this" 13 } 14 } 15 }, 16 1 17 ] 18 } 19 }, 20 "reason": "expression did not match", 21 "expressionResult": false 22 } 23 ... 24 }
As the last example, let's see how we can use the geospatial features of
MQL to ensure that all the real estate properties in the collection are
located within the New York City boundaries. Our documents include a
geoLocation
field with coordinates. We can use $geoWithin
to check
that these coordinates are inside the geoJSON polygon (the polygon for
New York City in this example is approximate).Validator:
1 { 2 "geoLocation": { 3 "$geoWithin": { 4 "$geometry": { 5 "type": "Polygon", 6 "coordinates": [ 7 [ [ -73.91326904296874, 40.91091803848203 ], 8 [ -74.01626586914062, 40.75297891717686 ], 9 [ -74.05677795410156, 40.65563874006115 ], 10 [ -74.08561706542969, 40.65199222800328 ], 11 [ -74.14329528808594, 40.64417760251725 ], 12 [ -74.18724060058594, 40.643656594948524 ], 13 [ -74.234619140625, 40.556591288249905 ], 14 [ -74.26345825195312, 40.513277131087484 ], 15 [ -74.2510986328125, 40.49500373230525 ], 16 [ -73.94691467285156, 40.543026009954986 ], 17 [ -73.740234375, 40.589449604232975 ], 18 [ -73.71826171874999, 40.820045086716505 ], 19 [ -73.78829956054686, 40.8870435151357 ], 20 [ -73.91326904296874, 40.91091803848203 ] ] 21 ] 22 } 23 } 24 }, 25 "$jsonSchema": {...} 26 }
A document like this will be inserted successfully.
1 { 2 "PID": "TS10018A1", 3 "type": "Residential", 4 "geoLocation": [ -73.9826509, 40.737499 ], 5 "originalPrice": 600000, 6 "discountedPrice": 550000, 7 "saleDetails": {...} 8 }
The following document will fail.
1 { 2 "PID": "TS10018A1", 3 "type": "Residential", 4 "geoLocation": [ -73.9826509, 80.737499 ], 5 "originalPrice": 600000, 6 "discountedPrice": 550000, 7 "saleDetails": {...} 8 }
The error will indicate that validation failed the
$geoWithin
operator, and the reason is "none of the considered geometries were
contained within the expression's geometry."1 { 2 ... 3 "details": { 4 "operatorName": "$geoWithin", 5 "specifiedAs": { 6 "geoLocation": { 7 "$geoWithin": {...} 8 } 9 }, 10 "reason": "none of the considered geometries were contained within the 11 expression's geometry", 12 "consideredValues": [ -73.9826509, 80.737499 ] 13 } 14 ... 15 }
Schema validation is a great tool to enforce governance over your data
sets. You have the choice to express the validation rules using JSON
Schema, MongoDB Query Language, or both. And now, with the detailed
error messages, it gets even easier to use, and you can have the rules
be as sophisticated as you need, without the risk of costly maintenance.
If you would like to evaluate this feature and provide us early
feedback, fill in this
form to
participate in the preview program.
More posts on schema validation:
Questions? Comments? We'd love to connect with you. Join the
conversation on the MongoDB Community
Forums.
Safe Harbor
The development, release, and timing of any features or functionality
described for our products remains at our sole discretion. This
information is merely intended to outline our general product direction
and it should not be relied on in making a purchasing decision nor is
this a commitment, promise or legal obligation to deliver any material,
code, or functionality.