Docs Home → Develop Applications → Python Drivers → PyMongo
Filtered Subset
On this page
Introduction
In this tutorial, you can learn how to use PyMongo to construct an aggregation pipeline, perform the aggregation on a collection, and print the results by completing and running a sample app. This aggregation performs the following operations:
Matches a subset of documents by a field value
Formats result documents
Aggregation Task Summary
This tutorial demonstrates how to query a collection for a specific subset of documents in a collection. The results contain documents that describe the three youngest people who are engineers.
This example uses one collection, persons
, which contains
documents describing people. Each document includes a person's name,
date of birth, vocation, and other details.
Before You Get Started
Before you start this tutorial, complete the Aggregation Template App instructions to set up a working Python application.
After you set up the app, access the persons
collection by adding the
following code to the application:
person_coll = agg_db["persons"]
Delete any existing data in the collections and insert sample data into
the persons
collection as shown in the following code:
person_coll.delete_many({}) person_data = [ { "person_id": "6392529400", "firstname": "Elise", "lastname": "Smith", "dateofbirth": datetime(1972, 1, 13, 9, 32, 7), "vocation": "ENGINEER", "address": { "number": 5625, "street": "Tipa Circle", "city": "Wojzinmoj", } }, { "person_id": "1723338115", "firstname": "Olive", "lastname": "Ranieri", "dateofbirth": datetime(1985, 5, 12, 23, 14, 30), "gender": "FEMALE", "vocation": "ENGINEER", "address": { "number": 9303, "street": "Mele Circle", "city": "Tobihbo", } }, { "person_id": "8732762874", "firstname": "Toni", "lastname": "Jones", "dateofbirth": datetime(1991, 11, 23, 16, 53, 56), "vocation": "POLITICIAN", "address": { "number": 1, "street": "High Street", "city": "Upper Abbeywoodington", } }, { "person_id": "7363629563", "firstname": "Bert", "lastname": "Gooding", "dateofbirth": datetime(1941, 4, 7, 22, 11, 52), "vocation": "FLORIST", "address": { "number": 13, "street": "Upper Bold Road", "city": "Redringtonville", } }, { "person_id": "1029648329", "firstname": "Sophie", "lastname": "Celements", "dateofbirth": datetime(1959, 7, 6, 17, 35, 45), "vocation": "ENGINEER", "address": { "number": 5, "street": "Innings Close", "city": "Basilbridge", } }, { "person_id": "7363626383", "firstname": "Carl", "lastname": "Simmons", "dateofbirth": datetime(1998, 12, 26, 13, 13, 55), "vocation": "ENGINEER", "address": { "number": 187, "street": "Hillside Road", "city": "Kenningford", } } ] person_coll.insert_many(person_data)
Tutorial
Add a match stage for people who are engineers
First, add a $match stage that finds documents in which
the value of the vocation
field is "ENGINEER"
:
pipeline.append({ "$match": { "vocation": "ENGINEER" } })
Add a sort stage to sort from youngest to oldest
Next, add a $sort stage that sorts the
documents in descending order by the dateofbirth
field to
list the youngest people first. Because Python dictionaries don't maintain the
order of their elements, use a SON``or ``OrderedDict
object
instead:
pipeline.append({ "$sort": { "dateofbirth": -1 } })
Add a limit stage to see only three results
Next, add a $limit stage to the pipeline to output only the first three documents in the results.
pipeline.append({ "$limit": 3 })
Add an unset stage to remove unneeded fields
Finally, add an $unset stage. The
$unset
stage removes unnecessary fields from the result documents:
pipeline.append({ "$unset": [ "_id", "address" ] })
Tip
Use the $unset
operator instead of $project
to avoid
modifying the aggregation pipeline if documents with
different fields are added to the collection.
Interpret results
The aggregated result contains three documents. The documents
represent the three youngest people with the vocation of "ENGINEER"
,
ordered from youngest to oldest. The results omit the _id
and address
fields.
{ 'person_id': '7363626383', 'firstname': 'Carl', 'lastname': 'Simmons', 'dateofbirth': datetime.datetime(1998, 12, 26, 13, 13, 55), 'vocation': 'ENGINEER' } { 'person_id': '1723338115', 'firstname': 'Olive', 'lastname': 'Ranieri', 'dateofbirth': datetime.datetime(1985, 5, 12, 23, 14, 30), 'gender': 'FEMALE', 'vocation': 'ENGINEER' } { 'person_id': '6392529400', 'firstname': 'Elise', 'lastname': 'Smith', 'dateofbirth': datetime.datetime(1972, 1, 13, 9, 32, 7), 'vocation': 'ENGINEER' }
To view the complete code for this tutorial, see the Completed Filtered Subset App on GitHub.