$lookup (aggregation)
Definition
$lookup
New in version 3.2.
Performs a left outer join to an unsharded collection in the same database to filter in documents from the "joined" collection for processing. To each input document, the
$lookup
stage adds a new array field whose elements are the matching documents from the "joined" collection. The$lookup
stage passes these reshaped documents to the next stage.
Compatibility
You can use $lookup
for deployments hosted in the following
environments:
MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud
MongoDB Enterprise: The subscription-based, self-managed version of MongoDB
MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB
Syntax
The $lookup
stage has the following syntaxes:
Equality Match with a Single Join Condition
To perform an equality match between a field from the input documents
with a field from the documents of the "joined" collection, the
$lookup
stage has the following syntax:
{ $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> } }
The $lookup
takes a document with the following fields:
Field | Description |
---|---|
Specifies the collection in the same database to perform
the join with. The | |
Specifies the field from the documents input to the
| |
Specifies the name of the new array field to add to the input
documents. The new array field contains the matching
documents from the |
The operation corresponds to this pseudo-SQL statement:
SELECT *, ( SELECT ARRAY_AGG(*) FROM <collection to join> WHERE <foreignField> = <collection.localField> ) AS <output array field> FROM collection;
Note
The SQL statements on this page are included for comparison to the MongoDB aggregation pipeline syntax. The SQL statements aren't runnable.
For MongoDB examples, see these pages:
Join Conditions and Uncorrelated Sub-queries
MongoDB supports:
To perform uncorrelated subqueries between two collections as well as
allow other join conditions besides a single equality match, the
$lookup
stage has the following syntax:
{ $lookup: { from: <collection to join>, let: { <var_1>: <expression>, …, <var_n>: <expression> }, pipeline: [ <pipeline to execute on the collection to join> ], as: <output array field> } }
The $lookup
takes a document with the following fields:
Field | Description |
---|---|
Specifies the collection in the same database to perform the
join with. The | |
Optional. Specifies variables to use in the
pipeline field stages. Use the
variable expressions to access the fields from the documents
input to the The pipeline cannot directly access
the input document fields. Instead, first define the variables
for the input document fields, and then reference the variables
in the stages in the NoteTo reference variables in pipeline
stages, use the The let variables can be accessed by the
stages in the pipeline, including
additional
| |
Specifies the pipeline to run on the joined collection. The
The The NoteTo reference variables in pipeline
stages, use the The let variables can be accessed by the
stages in the pipeline, including
additional
| |
Specifies the name of the new array field to add to the input
documents. The new array field contains the matching
documents from the |
The operation would correspond to the following pseudo-SQL statement:
SELECT *, <output array field> FROM collection WHERE <output array field> IN (SELECT <documents as determined from the pipeline> FROM <collection to join> WHERE <pipeline> );
See the following examples:
Consideration
Views and Collation
If performing an aggregation that involves multiple views, such as
with $lookup
or $graphLookup
, the views must
have the same collation.
Restrictions
Changed in version 4.2: You cannot include the
$out
or the$merge
stage in the$lookup
stage. That is, when specifying a pipeline for the joined collection, you cannot include either stage in thepipeline
field.{ $lookup: { from: <collection to join>, let: { <var_1>: <expression>, …, <var_n>: <expression> }, pipeline: [ <pipeline to execute on the joined collection> ], // Cannot include $out or $merge as: <output array field> } }
Sharded Collection Restrictions
In the $lookup
stage, the from
collection cannot be
sharded. However, the collection on which you run the
aggregate()
method can be sharded. That is, in
the following:
db.collection.aggregate([ { $lookup: { from: "fromCollection", ... } } ])
The
collection
can be sharded.The
fromCollection
cannot be sharded.
As such, to join a sharded collection with an unsharded collection, you can run the aggregation on the sharded collection and lookup the unsharded collection; e.g.:
db.shardedCollection.aggregate([ { $lookup: { from: "unshardedCollection", ... } } ])
Alternatively, or to join multiple sharded collections, consider:
Modifying client applications to perform manual lookups instead of using the
$lookup
aggregation stage.If possible, using an embedded data model that removes the need to join collections.
If possible, using an Atlas Data Lake $lookup pipeline stage to lookup a sharded collection.
Performance Considerations
$lookup
performance depends on the type of operation performed.
Refer to the following table for performance considerations for
different $lookup
operations.
$lookup Operation | Performance Considerations |
---|---|
| |
|
For general performance strategies, see Indexing Strategies and Query Optimization.
Important
Excessive use of $lookup
within a query may slow down
performance. To avoid multiple $lookup
stages, consider an
embedded data model to optimize query
performance.
Examples
Perform a Single Equality Join with $lookup
Create a collection orders
with the following documents:
db.orders.insert([ { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 }, { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }, { "_id" : 3 } ])
Create another collection inventory
with the following documents:
db.inventory.insert([ { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 }, { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 }, { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 }, { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 }, { "_id" : 5, "sku": null, "description": "Incomplete" }, { "_id" : 6 } ])
The following aggregation operation on the orders
collection
joins the documents from orders
with the documents from the
inventory
collection using the fields item
from the
orders
collection and the sku
field from the inventory
collection:
db.orders.aggregate([ { $lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" } } ])
The operation returns the following documents:
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2, "inventory_docs" : [ { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 } ] } { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1, "inventory_docs" : [ { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 } ] } { "_id" : 3, "inventory_docs" : [ { "_id" : 5, "sku" : null, "description" : "Incomplete" }, { "_id" : 6 } ] }
The operation would correspond to the following pseudo-SQL statement:
SELECT *, inventory_docs FROM orders WHERE inventory_docs IN (SELECT * FROM inventory WHERE sku= orders.item);
For more information, see Equality Match Performance Considerations.
For more information, see Equality Match Performance Considerations.
Use $lookup
with an Array
Starting MongoDB 3.4, if the localField
is an array, you can match
the array elements against a scalar foreignField
without needing an
$unwind
stage.
For example, create an example collection classes
with the
following document:
db.classes.insert( [ { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] }, { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] } ])
Create another collection members
with the following documents:
db.members.insert( [ { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" }, { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" }, { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" }, { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" }, { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" }, { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" } ])
The following aggregation operation joins documents in the classes
collection with the members
collection, matching on the
enrollmentlist
field to the name
field:
db.classes.aggregate([ { $lookup: { from: "members", localField: "enrollmentlist", foreignField: "name", as: "enrollee_info" } } ])
The operation returns the following:
{ "_id" : 1, "title" : "Reading is ...", "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ], "days" : [ "M", "W", "F" ], "enrollee_info" : [ { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" }, { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" }, { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" } ] } { "_id" : 2, "title" : "But Writing ...", "enrollmentlist" : [ "giraffe1", "artie" ], "days" : [ "T", "F" ], "enrollee_info" : [ { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" }, { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" } ] }
Use $lookup
with $mergeObjects
Changed in version 3.6: MongoDB 3.6 adds the $mergeObjects
operator to combine
multiple documents into a single document
Create a collection orders
with the following documents:
db.orders.insert([ { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 }, { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 } ])
Create another collection items
with the following documents:
db.items.insert([ { "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 }, { "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 }, { "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 } ])
The following operation first uses the $lookup
stage to
join the two collections by the item
fields and then uses
$mergeObjects
in the $replaceRoot
to merge
the joined documents from items
and orders
:
db.orders.aggregate([ { $lookup: { from: "items", localField: "item", // field in the orders collection foreignField: "item", // field in the items collection as: "fromItems" } }, { $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } } }, { $project: { fromItems: 0 } } ])
The operation returns the following documents:
{ "_id" : 1, "item" : "almonds", "description" : "almond clusters", "instock" : 120, "price" : 12, "quantity" : 2 } { "_id" : 2, "item" : "pecans", "description" : "candied pecans", "instock" : 60, "price" : 20, "quantity" : 1 }
Specify Multiple Join Conditions with $lookup
Create a collection orders
with the following documents:
db.orders.insert([ { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 }, { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 }, { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 } ])
Create another collection warehouses
with the following documents:
db.warehouses.insert([ { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 }, { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 }, { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 }, { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 }, { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 } ])
The following operation joins the orders
collection with the
warehouse
collection by the item and whether the quantity in stock
is sufficient to cover the ordered quantity:
db.orders.aggregate([ { $lookup: { from: "warehouses", let: { order_item: "$item", order_qty: "$ordered" }, pipeline: [ { $match: { $expr: { $and: [ { $eq: [ "$stock_item", "$$order_item" ] }, { $gte: [ "$instock", "$$order_qty" ] } ] } } }, { $project: { stock_item: 0, _id: 0 } } ], as: "stockdata" } } ])
The operation returns the following documents:
{ "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2, "stockdata" : [ { "warehouse" : "A", "instock" : 120 }, { "warehouse" : "B", "instock" : 60 } ] } { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1, "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] } { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60, "stockdata" : [ { "warehouse" : "A", "instock" : 80 } ] }
The operation corresponds to the following pseudo-SQL statement:
SELECT *, stockdata FROM orders WHERE stockdata IN ( SELECT warehouse, instock FROM warehouses WHERE stock_item = orders.item AND instock >= orders.ordered );
The $expr
operator only uses indexes on the from
collection for equality matches. For example, if the index
{ stock_item: 1, instock: 1 }
exists on the warehouses
collection:
The equality match on the
warehouses.stock_item
field uses the index.The range part of the query on the
warehouses.instock
field does not use the indexed field in the compound index.
Uncorrelated Subquery
Changed in version 3.6: MongoDB 3.6 adds support for executing a pipeline on the joined collection, which allows for specifying multiple join conditions as well as uncorrelated sub-queries.
Create a collection absences
with the following documents:
db.absences.insert([ { "_id" : 1, "student" : "Ann Aardvark", sickdays: [ new Date ("2018-05-01"),new Date ("2018-08-23") ] }, { "_id" : 2, "student" : "Zoe Zebra", sickdays: [ new Date ("2018-02-01"),new Date ("2018-05-23") ] }, ])
Create another collection holidays
with the following documents:
db.holidays.insert([ { "_id" : 1, year: 2018, name: "New Years", date: new Date("2018-01-01") }, { "_id" : 2, year: 2018, name: "Pi Day", date: new Date("2018-03-14") }, { "_id" : 3, year: 2018, name: "Ice Cream Day", date: new Date("2018-07-15") }, { "_id" : 4, year: 2017, name: "New Years", date: new Date("2017-01-01") }, { "_id" : 5, year: 2017, name: "Ice Cream Day", date: new Date("2017-07-16") } ])
The following operation joins the absences
collection with 2018
holiday information from the holidays
collection:
db.absences.aggregate([ { $lookup: { from: "holidays", pipeline: [ { $match: { year: 2018 } }, { $project: { _id: 0, date: { name: "$name", date: "$date" } } }, { $replaceRoot: { newRoot: "$date" } } ], as: "holidays" } } ])
The operation returns the following:
{ "_id" : 1, "student" : "Ann Aardvark", "sickdays" : [ ISODate("2018-05-01T00:00:00Z"), ISODate("2018-08-23T00:00:00Z") ], "holidays" : [ { "name" : "New Years", "date" : ISODate("2018-01-01T00:00:00Z") }, { "name" : "Pi Day", "date" : ISODate("2018-03-14T00:00:00Z") }, { "name" : "Ice Cream Day", "date" : ISODate("2018-07-15T00:00:00Z") } ] } { "_id" : 2, "student" : "Zoe Zebra", "sickdays" : [ ISODate("2018-02-01T00:00:00Z"), ISODate("2018-05-23T00:00:00Z") ], "holidays" : [ { "name" : "New Years", "date" : ISODate("2018-01-01T00:00:00Z") }, { "name" : "Pi Day", "date" : ISODate("2018-03-14T00:00:00Z") }, { "name" : "Ice Cream Day", "date" : ISODate("2018-07-15T00:00:00Z") } ] }
The operation would correspond to the following pseudo-SQL statement:
SELECT *, holidays FROM absences WHERE holidays IN (SELECT name, date FROM holidays WHERE year = 2018);