Multi-Field Join
On this page
Introduction
In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app.
This aggregation performs a multi-field join. A multi-field join occurs when there are multiple corresponding fields in the documents of two collections that you use to match documents together. The aggregation matches these documents on the field values and combines information from both into one document.
Tip
One-to-many Joins
A one-to-many join is a variety of a multi-field join. When you perform a one-to-many join, you select one field from a document that matches a field value in multiple documents on the other side of the join. To learn more about these data relationships, see the Wikipedia entries about One-to-many (data model) and Many-to-many (data model).
Aggregation Task Summary
This tutorial demonstrates how to combine data from a collection that describes product information with another collection that describes customer orders. The results show a list of products ordered in 2020 that also contains details about each order.
This example uses two collections:
products
, which contains documents describing the products that a shop sellsorders
, which contains documents describing individual orders for products in a shop
An order can only contain one product, so the aggregation uses a
multi-field join to match a product document to documents representing orders of
that product. The collections are joined by the name
and
variation
fields in documents in the products
collection, corresponding
to the product_name
and product_variation
fields in documents in
the orders
collection.
Before You Get Started
Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.
After you set up the app, access the products
and orders
collections by adding the following code to the application:
products_coll = agg_db["products"] orders_coll = agg_db["orders"]
Delete any existing data and insert sample data into
the products
collection as shown in the following code:
products_coll.delete_many({}) products_data = [ { "name": "Asus Laptop", "variation": "Ultra HD", "category": "ELECTRONICS", "description": "Great for watching movies" }, { "name": "Asus Laptop", "variation": "Standard Display", "category": "ELECTRONICS", "description": "Good value laptop for students" }, { "name": "The Day Of The Triffids", "variation": "1st Edition", "category": "BOOKS", "description": "Classic post-apocalyptic novel" }, { "name": "The Day Of The Triffids", "variation": "2nd Edition", "category": "BOOKS", "description": "Classic post-apocalyptic novel" }, { "name": "Morphy Richards Food Mixer", "variation": "Deluxe", "category": "KITCHENWARE", "description": "Luxury mixer turning good cakes into great" } ] products_coll.insert_many(products_data)
Delete any existing data and insert sample data into
the orders
collection as shown in the following code:
orders_coll.delete_many({}) order_data = [ { "customer_id": "elise_smith@myemail.com", "orderdate": datetime(2020, 5, 30, 8, 35, 52), "product_name": "Asus Laptop", "product_variation": "Standard Display", "value": 431.43 }, { "customer_id": "tj@wheresmyemail.com", "orderdate": datetime(2019, 5, 28, 19, 13, 32), "product_name": "The Day Of The Triffids", "product_variation": "2nd Edition", "value": 5.01 }, { "customer_id": "oranieri@warmmail.com", "orderdate": datetime(2020, 1, 1, 8, 25, 37), "product_name": "Morphy Richards Food Mixer", "product_variation": "Deluxe", "value": 63.13 }, { "customer_id": "jjones@tepidmail.com", "orderdate": datetime(2020, 12, 26, 8, 55, 46), "product_name": "Asus Laptop", "product_variation": "Standard Display", "value": 429.65 } ] orders_coll.insert_many(order_data)
Tutorial
Add a lookup stage to link the collections and import fields
The first stage of the pipeline is a $lookup stage to join the
orders
collection to the products
collection by two
fields in each collection. The lookup stage contains an
embedded pipeline to configure the join.
Within the embedded pipeline, add a $match stage to match the
values of two fields on each side of the join. Note that the following
code uses aliases for the name
and variation
fields
set when creating the $lookup
stage:
embedded_pl = [ { "$match": { "$expr": { "$and": [ {"$eq": ["$product_name", "$$prdname"]}, {"$eq": ["$product_variation", "$$prdvartn"]} ] } } } ]
Within the embedded pipeline, add another $match stage to match orders placed in 2020:
embedded_pl.append({ "$match": { "orderdate": { "$gte": datetime(2020, 1, 1, 0, 0, 0), "$lt": datetime(2021, 1, 1, 0, 0, 0) } } })
Within the embedded pipeline, add an $unset stage to remove
unneeded fields from the orders
collection side of the join:
embedded_pl.append({ "$unset": ["_id", "product_name", "product_variation"] })
After the embedded pipeline is completed, add the
$lookup
stage to the main aggregation pipeline.
Configure this stage to store the processed lookup fields in
an array field called orders
:
pipeline.append({ "$lookup": { "from": "orders", "let": { "prdname": "$name", "prdvartn": "$variation" }, "pipeline": embedded_pl, "as": "orders" } })
Add a match stage for products ordered in 2020
Next, add a $match stage to only show
products for which there is at least one order in 2020,
based on the orders
array calculated in the previous step:
pipeline.append({ "$match": { "orders": {"$ne": []} } })
Add an unset stage to remove unneeded fields
Finally, add an $unset stage. The
$unset
stage removes the _id
and description
fields from the result documents:
pipeline.append({ "$unset": ["_id", "description"] })
Interpret results
The aggregated result contains two documents. The documents
represent products for which there were orders placed in 2020.
Each document contains an orders
array field that lists details
about each order for that product:
{ 'name': 'Asus Laptop', 'variation': 'Standard Display', 'category': 'ELECTRONICS', 'orders': [ { 'customer_id': 'elise_smith@myemail.com', 'orderdate': datetime.datetime(2020, 5, 30, 8, 35, 52), 'value': 431.43 }, { 'customer_id': 'jjones@tepidmail.com', 'orderdate': datetime.datetime(2020, 12, 26, 8, 55, 46), 'value': 429.65 } ] } { 'name': 'Morphy Richards Food Mixer', 'variation': 'Deluxe', 'category': 'KITCHENWARE', 'orders': [ { 'customer_id': 'oranieri@warmmail.com', 'orderdate': datetime.datetime(2020, 1, 1, 8, 25, 37), 'value': 63.13 } ] }
The result documents contain details from documents in the
orders
collection and the products
collection, joined by
the product names and variations.
To view the complete code for this tutorial, see the Completed Multi-field Join App on GitHub.