Coding With Mark: Abstracting Joins & Subsets in Python

Mark Smith11 min read • Published Mar 19, 2024 • Updated Mar 19, 2024

MongoDB Aggregation Framework Python

SNIPPET

Rate this tutorial

This tutorial will talk about MongoDB design patterns — specifically, the Subset Pattern — and show how you can build an abstraction in your Python data model that hides how data is actually modeled within your database.

This is the third tutorial in a series! Feel free to check out the first tutorial or second tutorial if you like, but it's not necessary if you want to just read on.

Coding with Mark?

This tutorial is loosely based on some episodes of a livestream I host, called "Coding with Mark." I'm streaming on Wednesdays at 2 p.m. GMT (that's 9 a.m. ET or 6 a.m. PT, if you're an early riser!). If that time doesn't work for you, you can always catch up by watching the recordings!

Currently, I'm building an experimental data access layer library that should provide a toolkit for abstracting complex document models from the business logic layer of the application that's using them.

You can check out the code in the project's GitHub repository!

Setting the scene

The purpose of docbridge, my Object-Document Mapper, is to abstract the data model used within MongoDB from the data model used by a Python program. With a codebase of any size, you need something like this because otherwise, every time you change your data model (in your database), you need to change the object model (in your code). By having an abstraction layer, you localize all of this mapping into a single area of your codebase, and that's then the only part that needs to change when you change your data model. This ability to change your data model really allows you to take advantage of the flexibility of MongoDB's document model.

In the first tutorial, I showed a very simple abstraction, the FallbackField, that would try various different field names in a document until it found one that existed, and then would return that value. This was a very simple implementation of the Schema Versioning pattern.

In this tutorial, I'm going to abstract something more complex: the Subset Pattern.

The Subset Pattern

MongoDB allows you to store arrays in your documents, natively. The values in those arrays can be primitive types, like numbers, strings, dates, or even subdocuments. But sometimes, those arrays can get too big, and the Subset Pattern describes a technique where the most important subset of the array (often just the first few items) is stored directly in the embedded array, and any overflow items are stored in other documents and looked up only when necessary.

This solves two design problems: First, we recommend that you don't store more than 200 items in an array, as the more items you have, the slower the database is at traversing the fields in each document. Second, the subset pattern also answers a question that I've seen many times when we've been teaching data modeling: "How do I stop my array from growing so big that the document becomes bigger than the 16MB limit?" While we're on the subject, do avoid your documents getting this big — it usually implies that you could improve your data model, for example, by separating out data into separate documents, or if you're storing lots of binary data, you could keep it outside your database, in an object store.

Implementing the SequenceField type

Before delving into how to abstract a lookup for the extra array items that aren't embedded in the source document, I'll first implement a wrapper type for a BSON array. This can be used to declare array fields on a Document class, instead of the Field type that I implemented in previous articles.

I'm going to define a SequenceField to map a document's array into my access layer's object model. The core functionality of a SequenceField is you can specify a type for the array's items, and then when you iterate through the sequence, it will return you objects of that type, instead of just yielding the type that's stored in the document.

A concrete example would be a social media API's UserProfile class, which would store a list of Follower objects. I've created some sample documents with a Python script using Faker. A sample document looks like this:

1 {
2   "_id": { "$oid": "657072b56731c9e580e9dd70" },
3   "user_id": "4",
4   "user_name": "@tanya15",
5   "full_name": "Deborah White",
6   "birth_date": { "$date": { "$numberLong": "931219200000" } },
7   "email": "deanjacob@yahoo.com",
8   "bio": "Music conference able doctor degree debate. Participant usually above relate.",
9   "follower_count": { "$numberInt": "59" },
10   "followers": [
11     {
12       "_id": { "$oid": "657072b66731c9e580e9dda6" },
13       "user_id": "58",
14       "user_name": "@rduncan",
15       "bio": "Rich beautiful color life. Relationship instead win join enough board successful."
16     },
17     {
18       "_id": { "$oid": "657072b66731c9e580e9dd99" },
19       "user_id": "45",
20       "user_name": "@paynericky",
21       "bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid."
22     },
23     # ... other followers
24    ]
25 }

I can model this data using two classes — one for the top-level Profile data, and one for the summary data for that profile's followers (embedded in the array).

1 class Follower(Document):
2     _id = Field(transform=str)
3     user_name = Field()
4 
5 class Profile(Document):
6     _id = Field(transform=str)
7     followers = SequenceField(type=Follower)

If I want to loop through all the followers of a profile instance, each item should be a Follower instance:

1 profile = Profile(SOME_BSON_DATA)
2 for follower in profile.followers:
3         assert isinstance(follower, Follower)

This behavior can be implemented in a similar way to the Field class, by implementing it as a descriptor, with a __get__ method that, in this case, yields a Follower constructed for each item in the underlying BSON array. The code looks a little like this:

1 class SequenceField:
2     """
3     Allows an underlying array to have its elements wrapped in
4     Document instances.
5     """
6 
7     def __init__(
8         self,
9         type,
10         field_name=None,
11     ):
12         self._type = type
13         self.field_name = field_name
14 
15     def __set_name__(self, owner, name):
16         """
17         Called when the enclosing Document subclass (owner) is defined.
18         """
19         self.name = name  # Store the attribute name.
20 
21         # If a field-name mapping hasn't been provided,
22         # the BSON field will have the same name as the attribute name.
23         if self.field_name is None:
24             self.field_name = name
25 
26     def __get__(self, ob, cls):
27         """
28         Called when the SequenceField attribute is accessed on the enclosed
29         Document subclass.
30         """
31         try:
32             # Lookup the field in the BSON, and return an array where each item
33             # is wrapped by the class defined as type in __init__:
34             return [
35                 self._type(item, ob._db)
36                 for item in ob._doc[self.field_name]
37                 ]
38         except KeyError as ke:
39             raise ValueError(
40                 f"Attribute {self.name!r} is mapped to missing document property {self.field_name!r}."
41             ) from ke

That's a lot of code, but quite a lot of it is duplicated from Field - I'll fix that with some inheritance at some point. The most important part is near the end:

1 return [
2 
3     self._type(item, ob._db)
4 
5     for item in ob._doc[self.field_name]
6 ]

In the concrete example above, this would resolve to something like this fictional code:

1 return [
2    Follower(item, db=None) for item in profile._doc["followers"]
3 ]

Adding in the extra followers

The dataset I've created for working with this only stores the first 20 followers within a profile document. The rest are stored in a "followers" collection, and they're bucketed to store up to 20 followers per document, in a field called "followers." The "user_id" field says who the followers belong to. A single document in the "followers" collection looks like this:

The Bucket Pattern is a technique for putting lots of small subdocuments together in a bucket document, which can make it more efficient to retrieve documents that are usually retrieved together, and it can keep index sizes down. The downside is that it makes updating individual subdocuments slightly slower and more complex.

How to query documents in buckets

I have a collection where each document contains an array of followers — a "bucket" of followers. But what I want is a query that returns individual follower documents. Let's break down how this query will work:

I want to look up all the documents for a particular user_id.
For each item in followers — each item is a follower — I want to yield a single document for that follower.
I want to restructure each document so that it only contains the follower information, not the bucket information.

This is what I love about aggregation pipelines — once I've come up with those steps, I can often convert each step into an aggregation pipeline stage.

Step 1: Look up all the documents for a particular user:

1  {"$match": {"user_id": "4"}}

Note that this stage has hard-coded the value "4" for the "user_id" field. I'll explain later how dynamic values can be inserted into these queries. This outputs a single document, a bucket, containing many followers, in a field called "followers":

1 {
2   "user_name": "@tanya15",
3   "full_name": "Deborah White",
4   "birth_date": {
5     "$date": "1999-07-06T00:00:00.000Z"
6   },
7   "email": "deanjacob@yahoo.com",
8   "bio": "Music conference able doctor degree debate. Participant usually above relate.",
9   "user_id": "4",
10   "follower_count": 59,
11   "followers": [
12     {
13       "_id": {
14         "$oid": "657072b66731c9e580e9dda6"
15       },
16       "user_id": "58",
17       "user_name": "@rduncan",
18       "bio": "Rich beautiful color life. Relationship instead win join enough board successful."
19     },
20     {
21       "bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid.",
22       "_id": {
23         "$oid": "657072b66731c9e580e9dd99"
24       },
25       "user_id": "45",
26       "user_name": "@paynericky"
27     },
28     {
29       "_id": {
30         "$oid": "657072b76731c9e580e9ddba"
31       },
32       "user_id": "78",
33       "user_name": "@tiffanyhicks",
34       "bio": "Sign writer win. Look television official information laugh. Lay plan effect break expert message during firm."
35     },
36    . . .
37   ],
38   "_id": {
39     "$oid": "657072b56731c9e580e9dd70"
40   }
41 }

Step 2: Yield a document for each follower — the $unwind stage can do exactly this:

1 {"$unwind": "$followers"}

This instructs MongoDB to return one document for each item in the "followers" array. All of the document contents will be included, but the followers array will be replaced with the single follower subdocument each time. This outputs several documents, each containing a single follower in the "followers" field:

1 # First document:
2 {
3   "bio": "Music conference able doctor degree debate. Participant usually above relate.",
4   "follower_count": 59,
5   "followers": {
6     "_id": {
7       "$oid": "657072b66731c9e580e9dda6"
8     },
9     "user_id": "58",
10     "user_name": "@rduncan",
11     "bio": "Rich beautiful color life. Relationship instead win join enough board successful."
12   },
13   "user_id": "4",
14   "user_name": "@tanya15",
15   "full_name": "Deborah White",
16   "birth_date": {
17     "$date": "1999-07-06T00:00:00.000Z"
18   },
19   "email": "deanjacob@yahoo.com",
20   "_id": {
21     "$oid": "657072b56731c9e580e9dd70"
22   }
23 }
24 
25 # Second document
26 {
27   "_id": {
28     "$oid": "657072b56731c9e580e9dd70"
29   },
30   "full_name": "Deborah White",
31   "email": "deanjacob@yahoo.com",
32   "bio": "Music conference able doctor degree debate. Participant usually above relate.",
33   "follower_count": 59,
34   "user_id": "4",
35   "user_name": "@tanya15",
36   "birth_date": {
37     "$date": "1999-07-06T00:00:00.000Z"
38   },
39   "followers": {
40     "_id": {
41       "$oid": "657072b66731c9e580e9dd99"
42     },
43     "user_id": "45",
44     "user_name": "@paynericky",
45     "bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid."
46   }
47 
48 # . . . More documents follow

Step 3: Restructure the document, pulling the "follower" value up to the top-level of the document. There's a special stage for doing this — $replaceRoot:

1 {"$replaceRoot": {"newRoot": "$followers"}},

Adding the stage above results in each document containing a single follower, at the top level:

1 # Document 1:
2 {
3   "_id": {
4     "$oid": "657072b66731c9e580e9dda6"
5   },
6   "user_id": "58",
7   "user_name": "@rduncan",
8   "bio": "Rich beautiful color life. Relationship instead win join enough board successful."
9 }
10 
11 # Document 2
12 {
13   "_id": {
14     "$oid": "657072b66731c9e580e9dd99"
15   },
16   "user_id": "45",
17   "user_name": "@paynericky",
18   "bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid."
19 }
20 } # . . . More documents follow

Putting it all together, the query looks like this:

1 [
2     {"$match": {"user_id": "4"}},
3     {"$unwind": "$followers"},
4     {"$replaceRoot": {"newRoot": "$followers"}},
5 ]

I've explained the query that I want to be run each time I iterate through the followers field in my data abstraction library. Now, I'll show you how to hide this query (or whatever query is required) away in the SequenceField implementation.

Abstracting out the Lookup

Now, I would like to change the behavior of the SequenceField so that it does the following:

Iterate through the embedded subdocuments and yield each one, wrapped by type (the callable that wraps each subdocument.)
If the user gets to the end of the embedded array, make a query to look up the rest of the followers and yield them one by one, also wrapped by type.

First, I'll change the __init__ method so that the user can provide two extra parameters:

The collection that contains the extra documents, superset_collection
The query to run against that collection to return individual documents, superset_query

The result looks like this:

1 class Field: 
2     def __init__(
3         self,
4         type,
5         field_name=None,
6         superset_collection=None,
7         superset_query: Callable = None,
8     ):
9         self._type = type
10         self.field_name = field_name
11         self.superset_collection = superset_collection
12         self.superset_query = superset_query

The query will have to be provided as a callable, i.e., a function, lambda expression, or method. The reason for that is that generating the query will usually need access to some of the state of the document (in this case, the user_id, to construct the query to look up the correct follower documents.) The callable is stored in the Field instance, and then when the lookup is needed, it calls the callable, passing it the Document that contains the Field, so the callable can look up the user "_id" in the wrapped _doc dictionary.

Now that the user can provide enough information to look up the extra followers (the superset), I changed the __get__ method to perform the lookup when it runs out of embedded followers. To make this simpler to write, I took advantage of laziness. Twice! Here's how:

Laziness Part 1: When you execute a query by calling find or aggregate, the query is not executed immediately. Instead, the method immediately returns a cursor. Cursors are lazy — which means they don't do anything until you start to use them, by iterating over their contents. As soon as you start to iterate, or loop, over the cursor, it then queries the database and starts to yield results.

Laziness Part 2: Most of the functions in the core Python itertools module are also lazy, including the chain function. Chain is called with one or more iterables as arguments and then only starts to loop through the later arguments when the earlier iterables are exhausted (meaning the code has looped through all of the contents of the iterable.)

These can be combined to create a single iterable that will never request any extra followers from the database, unless the code specifically requests more items after looping through the embedded items:

1 embedded_followers = self._doc["followers"] # a list
2 cursor = followers.find({"user_id": "4"})   # a lazy database cursor
3 
4 # Looping through all_followers will only make a database call if you have 
5 # looped through all of the contents of embedded_followers:
6 all_followers = itertools.chain(embedded_followers, cursor)

The real code is a bit more flexible, because it supports both find and aggregate queries. It recognises the type because find queries are provided as dicts, and aggregate queries are lists.

1 def __get__(self, ob, cls):
2     if self.superset_query is None:
3         # Use an empty sequence if there are no extra items.
4         # It's still iterable, like a cursor, but immediately exits.
5         superset = []
6     else:
7         # Call the superset_query callable to obtain the generated query:
8         query = self.superset_query(ob)
9 
10         # If the query is a mapping, it's a find query, otherwise it's an
11         # aggregation pipeline.
12         if isinstance(query, Mapping):
13             superset = ob._db.get_collection(self.superset_collection).find(query)
14         elif isinstance(query, Iterable):
15             superset = ob._db.get_collection(self.superset_collection).aggregate(
16                 query
17             )
18         else:
19             raise Exception("Returned was not a mapping or iterable.")
20 
21     try:
22         # Return an iterable that first yields all the embedded items, and
23 
24         return chain(
25             [self._type(item, ob._db) for item in ob._doc[self.field_name]],
26             (self._type(item, ob._db) for item in superset),
27         )
28     except KeyError as ke:
29         raise ValueError(
30             f"Attribute {self.name!r} is mapped to missing document property {self.field_name!r}."
31         ) from ke

I've added quite a few comments to the code above, so hopefully you can see the relationship between the simplified code above it and the real code here.

Using the SequenceField to declare relationships

Implementing Profile and Follower is now a matter of providing the query (wrapped in a lambda expression) and the collection that should be queried.

1 # This is the same as it was originally
2 class Follower(Document):
3     _id = Field(transform=str)
4     user_name = Field()
5 
6 
7 def extra_followers_query(profile):
8     return [
9         {
10             "$match": {"user_id": profile.user_id},
11         },
12         {"$unwind": "$followers"},
13         {"$replaceRoot": {"newRoot": "$followers"}},
14     ]
15     
16 class Profile(Document):
17     _id = Field(transform=str)
18     followers = SequenceField(
19         type=Follower,
20         superset_collection="followers",
21         superset_query=lambda ob: extra_followers_query,
22     )

An application that used the above Profile definition could look up the Profile with "user_id" of "4" and then print out the user names of all their followers with some code like this:

1 for follower in profile.followers:
2     print(follower.user_name)

See how the extra query is now part of the type's mapping definition and not the code dealing with the data? That's the kind of abstraction I wanted to provide when I started building this experimental library. I have more plans, so stick with me! But before I implement more data abstractions, I first need to implement updates — that's something I'll describe in my next tutorial.

Conclusion

This is now the third tutorial in my Python data abstraction series, and I'll admit that this was the code I envisioned when I first came up with the idea of the docbridge library. It's been super satisfying to get to this point, and because I've been developing the whole thing with test-driven development practices, there's already good code coverage.

If you're looking for more information on aggregation pipelines, you should have a look at Practical MongoDB Aggregations — or now, you can buy an expanded version of the book in paperback.

If you're interested in the abstraction topics and Python code architecture in general, you can buy the Architecture Patterns with Python book, or read it online at CosmicPython.com

I livestream most weeks, usually at 2 p.m. UTC on Wednesdays. If that sounds interesting, check out the MongoDB YouTube channel. I look forward to seeing you there!

Top Comments in Forums

There are no comments on this article yet.

Start the Conversation

Rate this tutorial

Article

Using MongoDB Change Streams in Java

Aug 28, 2024 | 6 min read

Tutorial

Building a Dynamic Pricing Microservice with Vertex AI and MongoDB Atlas

Jan 22, 2025 | 18 min read

Tutorial

How We Backfilled 2 Million Database Documents

May 09, 2022 | 7 min read

Tutorial

Local Development with the MongoDB Atlas CLI and Docker

Feb 06, 2023 | 9 min read

Coding with Mark?
Setting the scene
The Subset Pattern
Implementing the SequenceField type
Adding in the extra followers
Using the SequenceField to declare relationships

MongoDB

Coding With Mark: Abstracting Joins & Subsets in Python

Coding with Mark?

Setting the scene

The Subset Pattern

Implementing the SequenceField type

Adding in the extra followers

How to query documents in buckets

Abstracting out the Lookup

Using the SequenceField to declare relationships

Conclusion

Top Comments in Forums

Related

Using MongoDB Change Streams in Java

Building a Dynamic Pricing Microservice with Vertex AI and MongoDB Atlas

How We Backfilled 2 Million Database Documents

Local Development with the MongoDB Atlas CLI and Docker

Table of Contents

1	{
2	"_id": { "$oid": "657072b56731c9e580e9dd70" },
3	"user_id": "4",
4	"user_name": "@tanya15",
5	"full_name": "Deborah White",
6	"birth_date": { "$date": { "$numberLong": "931219200000" } },
7	"email": "deanjacob@yahoo.com",
8	"bio": "Music conference able doctor degree debate. Participant usually above relate.",
9	"follower_count": { "$numberInt": "59" },
10	"followers": [
11	{
12	"_id": { "$oid": "657072b66731c9e580e9dda6" },
13	"user_id": "58",
14	"user_name": "@rduncan",
15	"bio": "Rich beautiful color life. Relationship instead win join enough board successful."
16	},
17	{
18	"_id": { "$oid": "657072b66731c9e580e9dd99" },
19	"user_id": "45",
20	"user_name": "@paynericky",
21	"bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid."
22	},
23	# ... other followers
24	]
25	}

1	class Follower(Document):
2	_id = Field(transform=str)
3	user_name = Field()
4
5	class Profile(Document):
6	_id = Field(transform=str)
7	followers = SequenceField(type=Follower)

1	profile = Profile(SOME_BSON_DATA)
2	for follower in profile.followers:
3	assert isinstance(follower, Follower)

1	class SequenceField:
2	"""
3	Allows an underlying array to have its elements wrapped in
4	Document instances.
5	"""
6
7	def __init__(
8	self,
9	type,
10	field_name=None,
11	):
12	self._type = type
13	self.field_name = field_name
14
15	def __set_name__(self, owner, name):
16	"""
17	Called when the enclosing Document subclass (owner) is defined.
18	"""
19	self.name = name # Store the attribute name.
20
21	# If a field-name mapping hasn't been provided,
22	# the BSON field will have the same name as the attribute name.
23	if self.field_name is None:
24	self.field_name = name
25
26	def __get__(self, ob, cls):
27	"""
28	Called when the SequenceField attribute is accessed on the enclosed
29	Document subclass.
30	"""
31	try:
32	# Lookup the field in the BSON, and return an array where each item
33	# is wrapped by the class defined as type in __init__:
34	return [
35	self._type(item, ob._db)
36	for item in ob._doc[self.field_name]
37	]
38	except KeyError as ke:
39	raise ValueError(
40	f"Attribute {self.name!r} is mapped to missing document property {self.field_name!r}."
41	) from ke

1	return [
2	Follower(item, db=None) for item in profile._doc["followers"]
3	]

1	{
2	"user_name": "@tanya15",
3	"full_name": "Deborah White",
4	"birth_date": {
5	"$date": "1999-07-06T00:00:00.000Z"
6	},
7	"email": "deanjacob@yahoo.com",
8	"bio": "Music conference able doctor degree debate. Participant usually above relate.",
9	"user_id": "4",
10	"follower_count": 59,
11	"followers": [
12	{
13	"_id": {
14	"$oid": "657072b66731c9e580e9dda6"
15	},
16	"user_id": "58",
17	"user_name": "@rduncan",
18	"bio": "Rich beautiful color life. Relationship instead win join enough board successful."
19	},
20	{
21	"bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid.",
22	"_id": {
23	"$oid": "657072b66731c9e580e9dd99"
24	},
25	"user_id": "45",
26	"user_name": "@paynericky"
27	},
28	{
29	"_id": {
30	"$oid": "657072b76731c9e580e9ddba"
31	},
32	"user_id": "78",
33	"user_name": "@tiffanyhicks",
34	"bio": "Sign writer win. Look television official information laugh. Lay plan effect break expert message during firm."
35	},
36	. . .
37	],
38	"_id": {
39	"$oid": "657072b56731c9e580e9dd70"
40	}
41	}

1	# First document:
2	{
3	"bio": "Music conference able doctor degree debate. Participant usually above relate.",
4	"follower_count": 59,
5	"followers": {
6	"_id": {
7	"$oid": "657072b66731c9e580e9dda6"
8	},
9	"user_id": "58",
10	"user_name": "@rduncan",
11	"bio": "Rich beautiful color life. Relationship instead win join enough board successful."
12	},
13	"user_id": "4",
14	"user_name": "@tanya15",
15	"full_name": "Deborah White",
16	"birth_date": {
17	"$date": "1999-07-06T00:00:00.000Z"
18	},
19	"email": "deanjacob@yahoo.com",
20	"_id": {
21	"$oid": "657072b56731c9e580e9dd70"
22	}
23	}
24
25	# Second document
26	{
27	"_id": {
28	"$oid": "657072b56731c9e580e9dd70"
29	},
30	"full_name": "Deborah White",
31	"email": "deanjacob@yahoo.com",
32	"bio": "Music conference able doctor degree debate. Participant usually above relate.",
33	"follower_count": 59,
34	"user_id": "4",
35	"user_name": "@tanya15",
36	"birth_date": {
37	"$date": "1999-07-06T00:00:00.000Z"
38	},
39	"followers": {
40	"_id": {
41	"$oid": "657072b66731c9e580e9dd99"
42	},
43	"user_id": "45",
44	"user_name": "@paynericky",
45	"bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid."
46	}
47
48	# . . . More documents follow

1	# Document 1:
2	{
3	"_id": {
4	"$oid": "657072b66731c9e580e9dda6"
5	},
6	"user_id": "58",
7	"user_name": "@rduncan",
8	"bio": "Rich beautiful color life. Relationship instead win join enough board successful."
9	}
10
11	# Document 2
12	{
13	"_id": {
14	"$oid": "657072b66731c9e580e9dd99"
15	},
16	"user_id": "45",
17	"user_name": "@paynericky",
18	"bio": "Picture day couple democratic morning. Environment manage opportunity option star food she. Occur imagine population single avoid."
19	}
20	} # . . . More documents follow

1	[
2	{"$match": {"user_id": "4"}},
3	{"$unwind": "$followers"},
4	{"$replaceRoot": {"newRoot": "$followers"}},
5	]

1	class Field:
2	def __init__(
3	self,
4	type,
5	field_name=None,
6	superset_collection=None,
7	superset_query: Callable = None,
8	):
9	self._type = type
10	self.field_name = field_name
11	self.superset_collection = superset_collection
12	self.superset_query = superset_query

1	embedded_followers = self._doc["followers"] # a list
2	cursor = followers.find({"user_id": "4"}) # a lazy database cursor
3
4	# Looping through all_followers will only make a database call if you have
5	# looped through all of the contents of embedded_followers:
6	all_followers = itertools.chain(embedded_followers, cursor)

1	def __get__(self, ob, cls):
2	if self.superset_query is None:
3	# Use an empty sequence if there are no extra items.
4	# It's still iterable, like a cursor, but immediately exits.
5	superset = []
6	else:
7	# Call the superset_query callable to obtain the generated query:
8	query = self.superset_query(ob)
9
10	# If the query is a mapping, it's a find query, otherwise it's an
11	# aggregation pipeline.
12	if isinstance(query, Mapping):
13	superset = ob._db.get_collection(self.superset_collection).find(query)
14	elif isinstance(query, Iterable):
15	superset = ob._db.get_collection(self.superset_collection).aggregate(
16	query
17	)
18	else:
19	raise Exception("Returned was not a mapping or iterable.")
20
21	try:
22	# Return an iterable that first yields all the embedded items, and
23
24	return chain(
25	[self._type(item, ob._db) for item in ob._doc[self.field_name]],
26	(self._type(item, ob._db) for item in superset),
27	)
28	except KeyError as ke:
29	raise ValueError(
30	f"Attribute {self.name!r} is mapped to missing document property {self.field_name!r}."
31	) from ke

1	# This is the same as it was originally
2	class Follower(Document):
3	_id = Field(transform=str)
4	user_name = Field()
5
6
7	def extra_followers_query(profile):
8	return [
9	{
10	"$match": {"user_id": profile.user_id},
11	},
12	{"$unwind": "$followers"},
13	{"$replaceRoot": {"newRoot": "$followers"}},
14	]
15
16	class Profile(Document):
17	_id = Field(transform=str)
18	followers = SequenceField(
19	type=Follower,
20	superset_collection="followers",
21	superset_query=lambda ob: extra_followers_query,
22	)

1	for follower in profile.followers:
2	print(follower.user_name)