One-to-One Join

On this page

Introduction

Aggregation Task Summary
Before You Get Started
Tutorial
Add a match stage for orders in 2020
Add a lookup stage to link the collections
Add set stages to create new document fields
Add an unset stage to remove unneeded fields
Run the aggregation pipeline
Interpret results

Introduction

In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app.

This aggregation performs a one-to-one join. A one-to-one join occurs when a document in one collection has a field value that matches a single document in another collection that has the same field value. The aggregation matches these documents on the field value and combines information from both sources into one result.

Tip

A one-to-one join does not require the documents to have a one-to-one relationship. To learn more about this data relationship, see the Wikipedia entry about One-to-one (data model).

Aggregation Task Summary

This tutorial demonstrates how to combine data from a collection that describes product information with another collection that describes customer orders. The results show a list of all orders placed in 2020 that includes the product details associated with each order.

This example uses two collections:

orders: contains documents describing individual orders for products in a shop
products: contains documents describing the products that a shop sells

An order can only contain one product, so the aggregation uses a one-to-one join to match an order document to the document for the product. The collections are joined by a field called product_id that exists in documents in both collections.

Before You Get Started

Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.

After you set up the app, access the orders and products collections by adding the following code to the application:

orders_coll = agg_db["orders"]
products_coll = agg_db["products"]

Delete any existing data and insert sample data into the orders collection as shown in the following code:

orders_coll.delete_many({})
order_data = [
    {
        "customer_id": "elise_smith@myemail.com",
        "orderdate": datetime(2020, 5, 30, 8, 35, 52),
        "product_id": "a1b2c3d4",
        "value": 431.43
    },
    {
        "customer_id": "tj@wheresmyemail.com",
        "orderdate": datetime(2019, 5, 28, 19, 13, 32),
        "product_id": "z9y8x7w6",
        "value": 5.01
    },
    {
        "customer_id": "oranieri@warmmail.com",
        "orderdate": datetime(2020, 1, 1, 8, 25, 37),
        "product_id": "ff11gg22hh33",
        "value": 63.13
    },
    {
        "customer_id": "jjones@tepidmail.com",
        "orderdate": datetime(2020, 12, 26, 8, 55, 46),
        "product_id": "a1b2c3d4",
        "value": 429.65
    }
]
orders_coll.insert_many(order_data)

Delete any existing data and insert sample data into the products collection as shown in the following code:

products_coll.delete_many({})
product_data = [
    {
        "id": "a1b2c3d4",
        "name": "Asus Laptop",
        "category": "ELECTRONICS",
        "description": "Good value laptop for students"
    },
    {
        "id": "z9y8x7w6",
        "name": "The Day Of The Triffids",
        "category": "BOOKS",
        "description": "Classic post-apocalyptic novel"
    },
    {
        "id": "ff11gg22hh33",
        "name": "Morphy Richardds Food Mixer",
        "category": "KITCHENWARE",
        "description": "Luxury mixer turning good cakes into great"
    },
    {
        "id": "pqr678st",
        "name": "Karcher Hose Set",
        "category": "GARDEN",
        "description": "Hose + nosels + winder for tidy storage"
    }
]
products_coll.insert_many(product_data)

Tutorial

Add a match stage for orders in 2020

Add a $match stage that matches orders placed in 2020:

pipeline.append({
    "$match": {
        "orderdate": {
            "$gte": datetime(2020, 1, 1, 0, 0, 0),
            "$lt": datetime(2021, 1, 1, 0, 0, 0)
        }
    }
})

Add a lookup stage to link the collections

Next, add a $lookup stage. The $lookup stage joins the product_id field in the orders collection to the id field in the products collection:

pipeline.append({
    "$lookup": {
        "from": "products",
        "localField": "product_id",
        "foreignField": "id",
        "as": "product_mapping"
    }
})

Add set stages to create new document fields

Next, add two $set stages to the pipeline.

The first $set stage sets the product_mapping field to the first element in the product_mapping object created in the previous $lookup stage.

The second $set stage creates two new fields, product_name and product_category, from the values in the product_mapping object field:

pipeline.extend([
    {
        "$set": {
            "product_mapping": {"$first": "$product_mapping"}
        }
    },
    {
        "$set": {
            "product_name": "$product_mapping.name",
            "product_category": "$product_mapping.category"
        }
    }
])

Tip

Because this is a one-to-one join, the $lookup stage adds only one array element to the input document. The pipeline uses the $first operator to retrieve the data from this element.

Add an unset stage to remove unneeded fields

Finally, add an $unset stage. The $unset stage removes unnecessary fields from the document:

pipeline.append({"$unset": ["_id", "product_id", "product_mapping"]})

Run the aggregation pipeline

Add the following code to the end of your application to perform the aggregation on the orders collection:

aggregation_result = orders_coll.aggregate(pipeline)

Finally, run the following command in your shell to start your application:

python3 agg_tutorial.py

Interpret results

The aggregated result contains three documents. The documents represent customer orders that occurred in 2020, with the product_name and product_category of the ordered product:

{
  'customer_id': 'elise_smith@myemail.com',
  'orderdate': datetime.datetime(2020, 5, 30, 8, 35, 52),
  'value': 431.43,
  'product_name': 'Asus Laptop',
  'product_category': 'ELECTRONICS'
}
{
  'customer_id': 'oranieri@warmmail.com',
  'orderdate': datetime.datetime(2020, 1, 1, 8, 25, 37),
  'value': 63.13,
  'product_name': 'Morphy Richardds Food Mixer',
  'product_category': 'KITCHENWARE'
}
{
  'customer_id': 'jjones@tepidmail.com',
  'orderdate': datetime.datetime(2020, 12, 26, 8, 55, 46),
  'value': 429.65,
  'product_name': 'Asus Laptop',
  'product_category': 'ELECTRONICS'
}

The result consists of documents that contain fields from documents in the orders collection and the products collection, joined by matching the product_id field present in each original document.

To view the complete code for this tutorial, see the Completed One-to-one Join App on GitHub.

Back

Unpack Arrays and Group

Multi-Field Join