One-to-One Join
On this page
Introduction
In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app.
This aggregation performs a one-to-one join. A one-to-one join occurs when a document in one collection has a field value that matches a single document in another collection that has the same field value. The aggregation matches these documents on the field value and combines information from both sources into one result.
Tip
A one-to-one join does not require the documents to have a one-to-one relationship. To learn more about this data relationship, see the Wikipedia entry about One-to-one (data model).
Aggregation Task Summary
This tutorial demonstrates how to combine data from a collection that describes product information with another collection that describes customer orders. The results show a list of all orders placed in 2020 that includes the product details associated with each order.
This example uses two collections:
orders
: contains documents describing individual orders for products in a shopproducts
: contains documents describing the products that a shop sells
An order can only contain one product, so the aggregation uses a
one-to-one join to match an order document to the document for the
product. The collections are joined by a field called product_id
that exists in documents in both collections.
Before You Get Started
Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.
After you set up the app, access the orders
and products
collections by adding the following code to the application:
orders_coll = agg_db["orders"] products_coll = agg_db["products"]
Delete any existing data and insert sample data into
the orders
collection as shown in the following code:
orders_coll.delete_many({}) order_data = [ { "customer_id": "elise_smith@myemail.com", "orderdate": datetime(2020, 5, 30, 8, 35, 52), "product_id": "a1b2c3d4", "value": 431.43 }, { "customer_id": "tj@wheresmyemail.com", "orderdate": datetime(2019, 5, 28, 19, 13, 32), "product_id": "z9y8x7w6", "value": 5.01 }, { "customer_id": "oranieri@warmmail.com", "orderdate": datetime(2020, 1, 1, 8, 25, 37), "product_id": "ff11gg22hh33", "value": 63.13 }, { "customer_id": "jjones@tepidmail.com", "orderdate": datetime(2020, 12, 26, 8, 55, 46), "product_id": "a1b2c3d4", "value": 429.65 } ] orders_coll.insert_many(order_data)
Delete any existing data and insert sample data into
the products
collection as shown in the following code:
products_coll.delete_many({}) product_data = [ { "id": "a1b2c3d4", "name": "Asus Laptop", "category": "ELECTRONICS", "description": "Good value laptop for students" }, { "id": "z9y8x7w6", "name": "The Day Of The Triffids", "category": "BOOKS", "description": "Classic post-apocalyptic novel" }, { "id": "ff11gg22hh33", "name": "Morphy Richardds Food Mixer", "category": "KITCHENWARE", "description": "Luxury mixer turning good cakes into great" }, { "id": "pqr678st", "name": "Karcher Hose Set", "category": "GARDEN", "description": "Hose + nosels + winder for tidy storage" } ] products_coll.insert_many(product_data)
Tutorial
Add a match stage for orders in 2020
Add a $match stage that matches orders placed in 2020:
pipeline.append({ "$match": { "orderdate": { "$gte": datetime(2020, 1, 1, 0, 0, 0), "$lt": datetime(2021, 1, 1, 0, 0, 0) } } })
Add a lookup stage to link the collections
Next, add a $lookup stage. The
$lookup
stage joins the product_id
field in the orders
collection to the id
field in the products
collection:
pipeline.append({ "$lookup": { "from": "products", "localField": "product_id", "foreignField": "id", "as": "product_mapping" } })
Add set stages to create new document fields
Next, add two $set stages to the pipeline.
The first $set
stage sets the product_mapping
field
to the first element in the product_mapping
object
created in the previous $lookup
stage.
The second $set
stage creates two new fields, product_name
and product_category
, from the values in the
product_mapping
object field:
pipeline.extend([ { "$set": { "product_mapping": {"$first": "$product_mapping"} } }, { "$set": { "product_name": "$product_mapping.name", "product_category": "$product_mapping.category" } } ])
Tip
Because this is a one-to-one join, the $lookup
stage
adds only one array element to the input document. The pipeline
uses the $first
operator to retrieve the data from this element.
Add an unset stage to remove unneeded fields
Finally, add an $unset stage. The
$unset
stage removes unnecessary fields from the document:
pipeline.append({"$unset": ["_id", "product_id", "product_mapping"]})
Interpret results
The aggregated result contains three documents. The documents
represent customer orders that occurred in 2020, with the
product_name
and product_category
of the ordered product:
{ 'customer_id': 'elise_smith@myemail.com', 'orderdate': datetime.datetime(2020, 5, 30, 8, 35, 52), 'value': 431.43, 'product_name': 'Asus Laptop', 'product_category': 'ELECTRONICS' } { 'customer_id': 'oranieri@warmmail.com', 'orderdate': datetime.datetime(2020, 1, 1, 8, 25, 37), 'value': 63.13, 'product_name': 'Morphy Richardds Food Mixer', 'product_category': 'KITCHENWARE' } { 'customer_id': 'jjones@tepidmail.com', 'orderdate': datetime.datetime(2020, 12, 26, 8, 55, 46), 'value': 429.65, 'product_name': 'Asus Laptop', 'product_category': 'ELECTRONICS' }
The result consists of documents that contain fields from
documents in the orders
collection and the products
collection, joined by matching the product_id
field present in
each original document.
To view the complete code for this tutorial, see the Completed One-to-one Join App on GitHub.