Docs Menu

Docs HomeDevelop ApplicationsPython DriversPyMongo

Filtered Subset

On this page

  • Introduction
  • Aggregation Task Summary
  • Before You Get Started
  • Tutorial
  • Add a match stage for people who are engineers
  • Add a sort stage to sort from youngest to oldest
  • Add a limit stage to see only three results
  • Add an unset stage to remove unneeded fields
  • Run the aggregation pipeline
  • Interpret results

In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app. This aggregation performs the following operations:

  • Matches a subset of documents by a field value

  • Formats result documents

This tutorial demonstrates how to query a collection for a specific subset of documents in a collection. The results contain documents that describe the three youngest people who are engineers.

This example uses one collection, persons, which contains documents describing people. Each document includes a person's name, date of birth, vocation, and other details.

Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.

After you set up the app, access the persons collection by adding the following code to the application:

person_coll = agg_db["persons"]

Delete any existing data in the collections and insert sample data into the persons collection as shown in the following code:

person_coll.delete_many({})
person_data = [
{
"person_id": "6392529400",
"firstname": "Elise",
"lastname": "Smith",
"dateofbirth": datetime(1972, 1, 13, 9, 32, 7),
"vocation": "ENGINEER",
"address": {
"number": 5625,
"street": "Tipa Circle",
"city": "Wojzinmoj",
}
},
{
"person_id": "1723338115",
"firstname": "Olive",
"lastname": "Ranieri",
"dateofbirth": datetime(1985, 5, 12, 23, 14, 30),
"gender": "FEMALE",
"vocation": "ENGINEER",
"address": {
"number": 9303,
"street": "Mele Circle",
"city": "Tobihbo",
}
},
{
"person_id": "8732762874",
"firstname": "Toni",
"lastname": "Jones",
"dateofbirth": datetime(1991, 11, 23, 16, 53, 56),
"vocation": "POLITICIAN",
"address": {
"number": 1,
"street": "High Street",
"city": "Upper Abbeywoodington",
}
},
{
"person_id": "7363629563",
"firstname": "Bert",
"lastname": "Gooding",
"dateofbirth": datetime(1941, 4, 7, 22, 11, 52),
"vocation": "FLORIST",
"address": {
"number": 13,
"street": "Upper Bold Road",
"city": "Redringtonville",
}
},
{
"person_id": "1029648329",
"firstname": "Sophie",
"lastname": "Celements",
"dateofbirth": datetime(1959, 7, 6, 17, 35, 45),
"vocation": "ENGINEER",
"address": {
"number": 5,
"street": "Innings Close",
"city": "Basilbridge",
}
},
{
"person_id": "7363626383",
"firstname": "Carl",
"lastname": "Simmons",
"dateofbirth": datetime(1998, 12, 26, 13, 13, 55),
"vocation": "ENGINEER",
"address": {
"number": 187,
"street": "Hillside Road",
"city": "Kenningford",
}
}
]
person_coll.insert_many(person_data)
1

First, add a $match stage that finds documents in which the value of the vocation field is "ENGINEER":

pipeline.append({
"$match": {
"vocation": "ENGINEER"
}
})
2

Next, add a $sort stage that sorts the documents in descending order by the dateofbirth field to list the youngest people first. Because Python dictionaries don't maintain the order of their elements, use a SON``or ``OrderedDict object instead:

pipeline.append({
"$sort": {
"dateofbirth": -1
}
})
3

Next, add a $limit stage to the pipeline to output only the first three documents in the results.

pipeline.append({
"$limit": 3
})
4

Finally, add an $unset stage. The $unset stage removes unnecessary fields from the result documents:

pipeline.append({
"$unset": [
"_id",
"address"
]
})

Tip

Use the $unset operator instead of $project to avoid modifying the aggregation pipeline if documents with different fields are added to the collection.

5

Add the following code to the end of your application to perform the aggregation on the persons collection:

aggregation_result = person_coll.aggregate(pipeline)

Finally, run the following command in your shell to start your application:

python3 agg_tutorial.py
6

The aggregated result contains three documents. The documents represent the three youngest people with the vocation of "ENGINEER", ordered from youngest to oldest. The results omit the _id and address fields.

{
'person_id': '7363626383',
'firstname': 'Carl',
'lastname': 'Simmons',
'dateofbirth': datetime.datetime(1998, 12, 26, 13, 13, 55),
'vocation': 'ENGINEER'
}
{
'person_id': '1723338115',
'firstname': 'Olive',
'lastname': 'Ranieri',
'dateofbirth': datetime.datetime(1985, 5, 12, 23, 14, 30),
'gender': 'FEMALE',
'vocation': 'ENGINEER'
}
{
'person_id': '6392529400',
'firstname': 'Elise',
'lastname': 'Smith',
'dateofbirth': datetime.datetime(1972, 1, 13, 9, 32, 7),
'vocation': 'ENGINEER'
}

To view the complete code for this tutorial, see the Completed Filtered Subset App on GitHub.

← Aggregation Tutorials