Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

Join us at AWS re:Invent 2024! Learn how to use MongoDB for AI use cases.
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Document Validation for Polymorphic Collections

Andrew Morgan6 min read • Published Aug 10, 2023 • Updated Oct 01, 2024
MongoDB
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
In data modeling design reviews with customers, I often propose a schema where different documents in the same collection contain different types of data. This makes it efficient to fetch related documents in a single, indexed query. MongoDB's flexible schema is great for optimizing workloads in this way, but people can be concerned about losing control of what applications write to these collections.
Customers are often concerned about ensuring that only correctly formatted documents make it into a collection, and so I explain MongoDB's schema validation feature. The question then comes: "How does that work with a polymorphic/single-collection schema?" This post is intended to answer that question — and it's simpler than you might think.

The banking application and its data

The application I'm working on manages customer and account details. There's a many-to-many relationship between customers and accounts. The app needs to be able to efficiently query customer data based on the customer id, and account data based on either the id of its customer or the account id.
Here's an example of customer and account documents where my wife and I share a checking account but each have our own savings account:
1{
2 "_id": "kjfgjebgjfbkjb",
3 "customerId": "CUST-123456789",
4 "docType": "customer",
5 "name": {
6 "title": "Mr",
7 "first": "Andrew",
8 "middle": "James",
9 "last": "Morgan"
10 },
11 "address": {
12 "street1": "240 Blackfriars Rd",
13 "city": "London",
14 "postCode": "SE1 8NW",
15 "country": "UK"
16 },
17 "customerSince": ISODate("2005-05-20")
18}
19
20{
21 "_id": "jnafjkkbEFejfleLJ",
22 "customerId": "CUST-987654321",
23 "docType": "customer",
24 "name": {
25 "title": "Mrs",
26 "first": "Anne",
27 "last": "Morgan"
28 },
29 "address": {
30 "street1": "240 Blackfriars Rd",
31 "city": "London",
32 "postCode": "SE1 8NW",
33 "country": "UK"
34 },
35 "customerSince": ISODate("2003-12-01")
36}
37
38{
39 "_id": "dksfmkpGJPowefjdfhs",
40 "accountNumber": "ACC1000000654",
41 "docType": "account",
42 "accountType": "checking",
43 "customerId": [
44 "CUST-123456789",
45 "CUST-987654321"
46 ],
47 "dateOpened": ISODate("2003-12-01"),
48 "balance": NumberDecimal("5067.65")
49}
50
51{
52 "_id": "kliwiiejeqydioepwj",
53 "accountNumber": "ACC1000000432",
54 "docType": "account",
55 "accountType": "savings",
56 "customerId": [
57 "CUST-123456789"
58 ],
59 "dateOpened": ISODate("2005-10-28"),
60 "balance": NumberDecimal("10341.21")
61}
62
63{
64 "_id": "djahspihhfheiphfipewe",
65 "accountNumber": "ACC1000000890",
66 "docType": "account",
67 "accountType": "savings",
68 "customerId": [
69 "CUST-987654321"
70 ],
71 "dateOpened": ISODate("2003-12-15"),
72 "balance": NumberDecimal("10341.89")
73}
As an aside, these are the indexes I added to make those frequent queries I referred to more efficient:
1const indexKeys1 = { accountNumber: 1 };
2const indexKeys2 = { customerId: 1, accountType: 1 };
3const indexOptions1 = { partialFilterExpression: { docType: 'account' }};
4const indexOptions2 = { partialFilterExpression: { docType: 'customer' }};
5
6db.getCollection(collection).createIndex(indexKeys1, indexOptions1);
7db.getCollection(collection).createIndex(indexKeys2, indexOptions2);

Adding schema validation

To quote the docs
Schema validation lets you create validation rules for your fields, such as allowed data types and value ranges.
MongoDB uses a flexible schema model, which means that documents in a collection do not need to have the same fields or data types by default. Once you've established an application schema, you can use schema validation to ensure there are no unintended schema changes or improper data types.
The validation rules are pretty simple to set up, and tools like Hackolade can make it simpler still — even reverse-engineering your existing documents.
It's simple to imagine setting up a JSON schema validation rule for a collection where all documents share the same attributes and types. But what about polymorphic collections? Even in polymorphic collections, there is structure to the documents. Fortunately, the syntax for setting up the validation rules allows for the required optionality.
I have two different types of documents that I want to store in my Accounts collection — customer and account. I included a docType attribute in each document to identify which type of entity it represents.
I start by creating a JSON schema definition for each type of document:
1const customerSchema = {
2 required: ["docType", "customerId", "name", "customerSince"],
3 properties: {
4 docType: { enum: ["customer"] },
5 customerId: { bsonType: "string"},
6 name: {
7 bsonType: "object",
8 required: ["first", "last"],
9 properties: {
10 title: { enum: ["Mr", "Mrs", "Ms", "Dr"]},
11 first: { bsonType: "string" },
12 middle: { bsonType: "string" },
13 last: { bsonType: "string" }
14 }
15 },
16 address: {
17 bsonType: "object",
18 required: ["street1", "city", "postCode", "country"],
19 properties: {
20 street1: { bsonType: "string" },
21 street2: { bsonType: "string" },
22 postCode: { bsonType: "string" },
23 country: { bsonType: "string" }
24 }
25 },
26 customerSince: {
27 bsonType: "date"
28 }
29 }
30};
31
32const accountSchema = {
33 required: ["docType", "accountNumber", "accountType", "customerId", "dateOpened", "balance"],
34 properties: {
35 docType: { enum: ["account"] },
36 accountNumber: { bsonType: "string" },
37 accountType: { enum: ["checking", "savings", "mortgage", "loan"] },
38 customerId: { bsonType: "array" },
39 dateOpened: { bsonType: "date" },
40 balance: { bsonType: "decimal" }
41 }
42};
Those definitions define what attributes should be in the document and what types they should take. Note that fields can be optional — such as name.middle in the customer schema.
It's then a simple matter of using the oneOf JSON schema operator to allow documents that match either of the two schema:
1const schemaValidation = {
2 $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }
3};
4
5db.createCollection(collection, {validator: schemaValidation});
I wanted to go a stage further and add some extra, semantic validations:
  • For customer documents, the customerSince value can't be any earlier than the current time.
  • For account documents, the dateOpened value can't be any earlier than the current time.
  • For savings accounts, the balance can't fall below zero.
These documents represents these checks:
1const badCustomer = {
2 "$expr": { "$gt": ["$customerSince", "$$NOW"] }
3};
4
5const badAccount = {
6 $or: [
7 {
8 accountType: "savings",
9 balance: { $lt: 0}
10 },
11 {
12 "$expr": { "$gt": ["$dateOpened", "$$NOW"]}
13 }
14 ]
15};
16
17const schemaValidation = {
18 "$and": [
19 { $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }},
20 { $nor: [
21 badCustomer,
22 badAccount
23 ]
24 }
25 ]
26};
I updated the collection validation rules to include these new checks:
1const schemaValidation = {
2 "$and": [
3 { $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }},
4 { $nor: [
5 badCustomer,
6 badAccount
7 ]
8 }
9 ]
10};
11
12db.createCollection(collection, {validator: schemaValidation} );
If you want to recreate this in your own MongoDB database, then just paste this into your MongoDB playground in VS Code:
1const cust1 = {
2 "_id": "kjfgjebgjfbkjb",
3 "customerId": "CUST-123456789",
4 "docType": "customer",
5 "name": {
6 "title": "Mr",
7 "first": "Andrew",
8 "middle": "James",
9 "last": "Morgan"
10 },
11 "address": {
12 "street1": "240 Blackfriars Rd",
13 "city": "London",
14 "postCode": "SE1 8NW",
15 "country": "UK"
16 },
17 "customerSince": ISODate("2005-05-20")
18}
19
20const cust2 = {
21 "_id": "jnafjkkbEFejfleLJ",
22 "customerId": "CUST-987654321",
23 "docType": "customer",
24 "name": {
25 "title": "Mrs",
26 "first": "Anne",
27 "last": "Morgan"
28 },
29 "address": {
30 "street1": "240 Blackfriars Rd",
31 "city": "London",
32 "postCode": "SE1 8NW",
33 "country": "UK"
34 },
35 "customerSince": ISODate("2003-12-01")
36}
37
38const futureCustomer = {
39 "_id": "nansfanjnDjknje",
40 "customerId": "CUST-666666666",
41 "docType": "customer",
42 "name": {
43 "title": "Mr",
44 "first": "Wrong",
45 "last": "Un"
46 },
47 "address": {
48 "street1": "240 Blackfriars Rd",
49 "city": "London",
50 "postCode": "SE1 8NW",
51 "country": "UK"
52 },
53 "customerSince": ISODate("2025-05-20")
54}
55
56const acc1 = {
57 "_id": "dksfmkpGJPowefjdfhs",
58 "accountNumber": "ACC1000000654",
59 "docType": "account",
60 "accountType": "checking",
61 "customerId": [
62 "CUST-123456789",
63 "CUST-987654321"
64 ],
65 "dateOpened": ISODate("2003-12-01"),
66 "balance": NumberDecimal("5067.65")
67}
68
69const acc2 = {
70 "_id": "kliwiiejeqydioepwj",
71 "accountNumber": "ACC1000000432",
72 "docType": "account",
73 "accountType": "savings",
74 "customerId": [
75 "CUST-123456789"
76 ],
77 "dateOpened": ISODate("2005-10-28"),
78 "balance": NumberDecimal("10341.21")
79}
80
81const acc3 = {
82 "_id": "djahspihhfheiphfipewe",
83 "accountNumber": "ACC1000000890",
84 "docType": "account",
85 "accountType": "savings",
86 "customerId": [
87 "CUST-987654321"
88 ],
89 "dateOpened": ISODate("2003-12-15"),
90 "balance": NumberDecimal("10341.89")
91}
92
93const futureAccount = {
94 "_id": "kljkdfgjkdsgjklgjdfgkl",
95 "accountNumber": "ACC1000000999",
96 "docType": "account",
97 "accountType": "savings",
98 "customerId": [
99 "CUST-987654333"
100 ],
101 "dateOpened": ISODate("2030-12-15"),
102 "balance": NumberDecimal("10341.89")
103}
104
105const negativeSavings = {
106 "_id": "shkjahsjdkhHK",
107 "accountNumber": "ACC1000000666",
108 "docType": "account",
109 "accountType": "savings",
110 "customerId": [
111 "CUST-9837462376"
112 ],
113 "dateOpened": ISODate("2005-10-28"),
114 "balance": NumberDecimal("-10341.21")
115}
116
117const indexKeys1 = { accountNumber: 1 }
118const indexKeys2 = { customerId: 1, accountType: 1 }
119const indexOptions1 = { partialFilterExpression: { docType: 'account' }}
120const indexOptions2 = { partialFilterExpression: { docType: 'customer' }}
121
122const customerSchema = {
123 required: ["docType", "customerId", "name", "customerSince"],
124 properties: {
125 docType: { enum: ["customer"] },
126 customerId: { bsonType: "string"},
127 name: {
128 bsonType: "object",
129 required: ["first", "last"],
130 properties: {
131 title: { enum: ["Mr", "Mrs", "Ms", "Dr"]},
132 first: { bsonType: "string" },
133 middle: { bsonType: "string" },
134 last: { bsonType: "string" }
135 }
136 },
137 address: {
138 bsonType: "object",
139 required: ["street1", "city", "postCode", "country"],
140 properties: {
141 street1: { bsonType: "string" },
142 street2: { bsonType: "string" },
143 postCode: { bsonType: "string" },
144 country: { bsonType: "string" }
145 }
146 },
147 customerSince: {
148 bsonType: "date"
149 }
150 }
151}
152
153const accountSchema = {
154 required: ["docType", "accountNumber", "accountType", "customerId", "dateOpened", "balance"],
155 properties: {
156 docType: { enum: ["account"] },
157 accountNumber: { bsonType: "string" },
158 accountType: { enum: ["checking", "savings", "mortgage", "loan"] },
159 customerId: { bsonType: "array" },
160 dateOpened: { bsonType: "date" },
161 balance: { bsonType: "decimal" }
162 }
163}
164
165const badCustomer = {
166 "$expr": { "$gt": ["$customerSince", "$$NOW"] }
167}
168
169const badAccount = {
170 $or: [
171 {
172 accountType: "savings",
173 balance: { $lt: 0}
174 },
175 {
176 "$expr": { "$gt": ["$dateOpened", "$$NOW"]}
177 }
178 ]
179}
180
181const schemaValidation = {
182 "$and": [
183 { $jsonSchema: { oneOf: [ customerSchema, accountSchema ] }},
184 { $nor: [
185 badCustomer,
186 badAccount
187 ]
188 }
189 ]
190}
191
192const database = 'MongoBank';
193const collection = 'Accounts';
194
195use(database);
196db.getCollection(collection).drop();
197db.createCollection(collection, {validator: schemaValidation} )
198db.getCollection(collection).replaceOne({"_id": cust1._id}, cust1, {upsert: true});
199db.getCollection(collection).replaceOne({"_id": cust2._id}, cust2, {upsert: true});
200db.getCollection(collection).replaceOne({"_id": acc1._id}, acc1, {upsert: true});
201db.getCollection(collection).replaceOne({"_id": acc2._id}, acc2, {upsert: true});
202db.getCollection(collection).replaceOne({"_id": acc3._id}, acc3, {upsert: true});
203
204// The following 3 operations should fail
205
206db.getCollection(collection).replaceOne({"_id": negativeSavings._id}, negativeSavings, {upsert: true});
207db.getCollection(collection).replaceOne({"_id": futureCustomer._id}, futureCustomer, {upsert: true});
208db.getCollection(collection).replaceOne({"_id": futureAccount._id}, futureAccount, {upsert: true});
209
210db.getCollection(collection).dropIndexes();
211db.getCollection(collection).createIndex(indexKeys1, indexOptions1);
212db.getCollection(collection).createIndex(indexKeys2, indexOptions2);

Conclusion

I hope that this short article has shown how easy it is to use schema validations with MongoDB's polymorphic collections and single-collection design pattern.
I didn't go into much detail about why I chose the data model used in this example. If you want to know more (and you should!), then here are some great resources on data modeling with MongoDB:

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Tutorial

Optimizing $lookup Performance Using the Power of Indexing


Aug 30, 2024 | 7 min read
Quickstart

Introduction to the MongoDB Aggregation Framework


Sep 23, 2022 | 5 min read
Tutorial

Storing Large Objects and Files in MongoDB


May 13, 2022 | 1 min read
Tutorial

Subscribe to MongoDB Change Streams Via WebSockets


Sep 23, 2022 | 3 min read
Table of Contents
  • The banking application and its data