Docs Menu

Unpack Arrays and Group

In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app. This aggregation performs the following operations:

  • Unwinds an array field into separate documents

  • Matches a subset of documents by a field value

  • Groups documents by common field values

  • Adds computed fields to each result document

This tutorial demonstrates how to create insights from customer order data. The results show the list of products ordered that cost more than $15, and each document contains the number of units sold and the total sale value for each product.

This example uses one collection, orders, which contains documents describing product orders. Since each order contains multiple products, the first step of the aggregation is unpacking the products array into individual product order documents.

Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.

After you set up the app, access the orders collection by adding the following code to the application:

orders_coll = agg_db["orders"]

Delete any existing data and insert sample data into the orders collection as shown in the following code:

orders_coll.delete_many({})
order_data = [
{
"order_id": 6363763262239,
"products": [
{
"prod_id": "abc12345",
"name": "Asus Laptop",
"price": 431,
},
{
"prod_id": "def45678",
"name": "Karcher Hose Set",
"price": 22,
},
]
},
{
"order_id": 1197372932325,
"products": [
{
"prod_id": "abc12345",
"name": "Asus Laptop",
"price": 429,
}
]
},
{
"order_id": 9812343774839,
"products": [
{
"prod_id": "pqr88223",
"name": "Morphy Richards Food Mixer",
"price": 431,
},
{
"prod_id": "def45678",
"name": "Karcher Hose Set",
"price": 21,
}
]
},
{
"order_id": 4433997244387,
"products": [
{
"prod_id": "def45678",
"name": "Karcher Hose Set",
"price": 23,
},
{
"prod_id": "jkl77336",
"name": "Picky Pencil Sharpener",
"price": 1,
},
{
"prod_id": "xyz11228",
"name": "Russell Hobbs Chrome Kettle",
"price": 16,
}
]
}
]
orders_coll.insert_many(order_data)
1

First, add an $unwind stage to separate the entries in the products array into individual documents:

pipeline.append({
"$unwind": {
"path": "$products"
}
})
2

Next, add a $match stage that matches products with a products.price value greater than 15:

pipeline.append({
"$match": {
"products.price": {
"$gt": 15
}
}
})
3

Add a $group stage to group orders by the value of the prod_id field. In this stage, add aggregation operations that create the following fields in the result documents:

  • product: the product name

  • total_value: the total value of all the sales of the product

  • quantity: the number of orders for the product

pipeline.append({
"$group": {
"_id": "$products.prod_id",
"product": {"$first": "$products.name"},
"total_value": {"$sum": "$products.price"},
"quantity": {"$sum": 1}
}
})
4

Add a $set stage to recreate the product_id field from the values in the _id field that were set during the $group stage:

pipeline.append({
"$set": {
"product_id": "$_id"
}
})
5

Finally, add an $unset stage. The $unset stage removes the _id field from the result documents:

pipeline.append({"$unset": ["_id"]})
6

Add the following code to the end of your application to perform the aggregation on the orders collection:

aggregation_result = orders_coll.aggregate(pipeline)

Finally, run the following command in your shell to start your application:

python3 agg_tutorial.py
7

The aggregation returns the following summary of customers' orders from 2020:

{
'product': 'Asus Laptop',
'total_value': 860,
'quantity': 2,
'product_id': 'abc12345'
}
{
'product': 'Morphy Richards Food Mixer',
'total_value': 431,
'quantity': 1,
'product_id': 'pqr88223'
}
{
'product': 'Russell Hobbs Chrome Kettle',
'total_value': 16,
'quantity': 1,
'product_id': 'xyz11228'
}
{
'product': 'Karcher Hose Set',
'total_value': 66,
'quantity': 3,
'product_id': 'def45678'
}

The result documents contain details about the total value and quantity of orders for products that cost more than $15.

To view the complete code for this tutorial, see the Completed Unpack Arrays App on GitHub.