Docs Menu
Docs Home
/ / /
PyMongo
/ /

Multi-Field Join

On this page

  • Introduction
  • Aggregation Task Summary
  • Before You Get Started
  • Tutorial
  • Add a lookup stage to link the collections and import fields
  • Add a match stage for products ordered in 2020
  • Add an unset stage to remove unneeded fields
  • Run the aggregation pipeline
  • Interpret results

In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app.

This aggregation performs a multi-field join. A multi-field join occurs when there are multiple corresponding fields in the documents of two collections that you use to match documents together. The aggregation matches these documents on the field values and combines information from both into one document.

Tip

One-to-many Joins

A one-to-many join is a variety of a multi-field join. When you perform a one-to-many join, you select one field from a document that matches a field value in multiple documents on the other side of the join. To learn more about these data relationships, see the Wikipedia entries about One-to-many (data model) and Many-to-many (data model).

This tutorial demonstrates how to combine data from a collection that describes product information with another collection that describes customer orders. The results show a list of products ordered in 2020 that also contains details about each order.

This example uses two collections:

  • products, which contains documents describing the products that a shop sells

  • orders, which contains documents describing individual orders for products in a shop

An order can only contain one product, so the aggregation uses a multi-field join to match a product document to documents representing orders of that product. The collections are joined by the name and variation fields in documents in the products collection, corresponding to the product_name and product_variation fields in documents in the orders collection.

Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.

After you set up the app, access the products and orders collections by adding the following code to the application:

products_coll = agg_db["products"]
orders_coll = agg_db["orders"]

Delete any existing data and insert sample data into the products collection as shown in the following code:

products_coll.delete_many({})
products_data = [
{
"name": "Asus Laptop",
"variation": "Ultra HD",
"category": "ELECTRONICS",
"description": "Great for watching movies"
},
{
"name": "Asus Laptop",
"variation": "Standard Display",
"category": "ELECTRONICS",
"description": "Good value laptop for students"
},
{
"name": "The Day Of The Triffids",
"variation": "1st Edition",
"category": "BOOKS",
"description": "Classic post-apocalyptic novel"
},
{
"name": "The Day Of The Triffids",
"variation": "2nd Edition",
"category": "BOOKS",
"description": "Classic post-apocalyptic novel"
},
{
"name": "Morphy Richards Food Mixer",
"variation": "Deluxe",
"category": "KITCHENWARE",
"description": "Luxury mixer turning good cakes into great"
}
]
products_coll.insert_many(products_data)

Delete any existing data and insert sample data into the orders collection as shown in the following code:

orders_coll.delete_many({})
order_data = [
{
"customer_id": "elise_smith@myemail.com",
"orderdate": datetime(2020, 5, 30, 8, 35, 52),
"product_name": "Asus Laptop",
"product_variation": "Standard Display",
"value": 431.43
},
{
"customer_id": "tj@wheresmyemail.com",
"orderdate": datetime(2019, 5, 28, 19, 13, 32),
"product_name": "The Day Of The Triffids",
"product_variation": "2nd Edition",
"value": 5.01
},
{
"customer_id": "oranieri@warmmail.com",
"orderdate": datetime(2020, 1, 1, 8, 25, 37),
"product_name": "Morphy Richards Food Mixer",
"product_variation": "Deluxe",
"value": 63.13
},
{
"customer_id": "jjones@tepidmail.com",
"orderdate": datetime(2020, 12, 26, 8, 55, 46),
"product_name": "Asus Laptop",
"product_variation": "Standard Display",
"value": 429.65
}
]
orders_coll.insert_many(order_data)
1

The first stage of the pipeline is a $lookup stage to join the orders collection to the products collection by two fields in each collection. The lookup stage contains an embedded pipeline to configure the join.

Within the embedded pipeline, add a $match stage to match the values of two fields on each side of the join. Note that the following code uses aliases for the name and variation fields set when creating the $lookup stage:

embedded_pl = [
{
"$match": {
"$expr": {
"$and": [
{"$eq": ["$product_name", "$$prdname"]},
{"$eq": ["$product_variation", "$$prdvartn"]}
]
}
}
}
]

Within the embedded pipeline, add another $match stage to match orders placed in 2020:

embedded_pl.append({
"$match": {
"orderdate": {
"$gte": datetime(2020, 1, 1, 0, 0, 0),
"$lt": datetime(2021, 1, 1, 0, 0, 0)
}
}
})

Within the embedded pipeline, add an $unset stage to remove unneeded fields from the orders collection side of the join:

embedded_pl.append({
"$unset": ["_id", "product_name", "product_variation"]
})

After the embedded pipeline is completed, add the $lookup stage to the main aggregation pipeline. Configure this stage to store the processed lookup fields in an array field called orders:

pipeline.append({
"$lookup": {
"from": "orders",
"let": {
"prdname": "$name",
"prdvartn": "$variation"
},
"pipeline": embedded_pl,
"as": "orders"
}
})
2

Next, add a $match stage to only show products for which there is at least one order in 2020, based on the orders array calculated in the previous step:

pipeline.append({
"$match": {
"orders": {"$ne": []}
}
})
3

Finally, add an $unset stage. The $unset stage removes the _id and description fields from the result documents:

pipeline.append({
"$unset": ["_id", "description"]
})
4

Add the following code to the end of your application to perform the aggregation on the products collection:

aggregation_result = products_coll.aggregate(pipeline)

Finally, run the following command in your shell to start your application:

python3 agg_tutorial.py
5

The aggregated result contains two documents. The documents represent products for which there were orders placed in 2020. Each document contains an orders array field that lists details about each order for that product:

{
'name': 'Asus Laptop',
'variation': 'Standard Display',
'category': 'ELECTRONICS',
'orders': [
{
'customer_id': 'elise_smith@myemail.com',
'orderdate': datetime.datetime(2020, 5, 30, 8, 35, 52),
'value': 431.43
},
{
'customer_id': 'jjones@tepidmail.com',
'orderdate': datetime.datetime(2020, 12, 26, 8, 55, 46),
'value': 429.65
}
]
}
{
'name': 'Morphy Richards Food Mixer',
'variation': 'Deluxe',
'category': 'KITCHENWARE',
'orders': [
{
'customer_id': 'oranieri@warmmail.com',
'orderdate': datetime.datetime(2020, 1, 1, 8, 25, 37),
'value': 63.13
}
]
}

The result documents contain details from documents in the orders collection and the products collection, joined by the product names and variations.

To view the complete code for this tutorial, see the Completed Multi-field Join App on GitHub.

Back

One-to-One Join