$lookup (aggregation)
Definition
$lookup
Changed in version 5.0.
Performs a left outer join to an unsharded collection in the same database to filter in documents from the "joined" collection for processing. To each input document, the
$lookup
stage adds a new array field whose elements are the matching documents from the "joined" collection. The$lookup
stage passes these reshaped documents to the next stage.
Compatibility
You can use $lookup
for deployments hosted in the following
environments:
MongoDB Atlas: The fully managed service for MongoDB deployments in the cloud
MongoDB Enterprise: The subscription-based, self-managed version of MongoDB
MongoDB Community: The source-available, free-to-use, and self-managed version of MongoDB
Syntax
The $lookup
stage has the following syntaxes:
Equality Match with a Single Join Condition
To perform an equality match between a field from the input documents
with a field from the documents of the "joined" collection, the
$lookup
stage has this syntax:
{ $lookup: { from: <collection to join>, localField: <field from the input documents>, foreignField: <field from the documents of the "from" collection>, as: <output array field> } }
The $lookup
takes a document with these fields:
Field | Description |
---|---|
Specifies the collection in the same database to perform
the join with. The | |
Specifies the field from the documents input to the
| |
Specifies the name of the new array field to add to the input
documents. The new array field contains the matching
documents from the |
The operation corresponds to this pseudo-SQL statement:
SELECT *, ( SELECT ARRAY_AGG(*) FROM <collection to join> WHERE <foreignField> = <collection.localField> ) AS <output array field> FROM collection;
Note
The SQL statements on this page are included for comparison to the MongoDB aggregation pipeline syntax. The SQL statements aren't runnable.
For MongoDB examples, see these pages:
Join Conditions and Subqueries on a Joined Collection
MongoDB supports:
Executing a pipeline on a joined collection.
Multiple join conditions.
Correlated and uncorrelated subqueries.
In MongoDB, a correlated subquery is a pipeline in a $lookup
stage that references
document fields from a joined collection. An uncorrelated subquery does
not reference joined fields.
Note
Starting in MongoDB 5.0, for an uncorrelated subquery in a
$lookup
pipeline stage containing a $sample
stage, the $sampleRate
operator, or the
$rand
operator, the subquery is always run again if
repeated. Previously, depending on the subquery output size, either the
subquery output was cached or the subquery was run again.
MongoDB correlated subqueries are comparable to SQL correlated subqueries, where the inner query references outer query values. An SQL uncorrelated subquery does not reference outer query values.
MongoDB 5.0 also supports concise correlated subqueries.
To perform correlated and uncorrelated subqueries with two collections,
and perform other join conditions besides a single equality match, use
this $lookup
syntax:
{ $lookup: { from: <joined collection>, let: { <var_1>: <expression>, …, <var_n>: <expression> }, pipeline: [ <pipeline to run on joined collection> ], as: <output array field> } }
The $lookup
stage accepts a document with these fields:
Field | Description |
---|---|
Specifies the collection in the same database to perform the join operation. The joined collection cannot be sharded (see Sharded Collection Restrictions). | |
Optional. Specifies variables to use in the pipeline stages. Use the variable expressions to
access the fields from the joined collection's documents that are
input to the To reference variables in pipeline
stages, use the The let variables can be accessed by the
stages in the pipeline, including
additional
| |
Specifies the The The To reference variables in pipeline
stages, use the The let variables can be accessed by the
stages in the pipeline, including
additional
| |
Specifies the name of the new array field to add to the joined documents. The new array field contains the matching documents from the joined collection. If the specified name already exists in the joined document, the existing field is overwritten. |
The operation corresponds to this pseudo-SQL statement:
SELECT *, <output array field> FROM collection WHERE <output array field> IN ( SELECT <documents as determined from the pipeline> FROM <collection to join> WHERE <pipeline> );
See the following examples:
Correlated Subqueries Using Concise Syntax
New in version 5.0.
Starting in MongoDB 5.0, you can use a concise syntax for a correlated
subquery. Correlated subqueries reference document fields from a joined
"foreign" collection and the "local" collection on which the
aggregate()
method was run.
The following new concise syntax removes the requirement for an equality
match on the foreign and local fields inside of an $expr
operator:
{ $lookup: { from: <foreign collection>, localField: <field from local collection's documents>, foreignField: <field from foreign collection's documents>, let: { <var_1>: <expression>, …, <var_n>: <expression> }, pipeline: [ <pipeline to run> ], as: <output array field> } }
The $lookup
accepts a document with these fields:
Field | Description |
---|---|
Specifies the foreign collection in the same database to join to the local collection. The foreign collection cannot be sharded (see Sharded Collection Restrictions). | |
Specifies the local documents' If a local document does not contain a | |
Specifies the foreign documents' If a foreign document does not contain a | |
Optional. Specifies the variables to use in the pipeline stages. Use the
variable expressions to access the document fields that are input
to the To reference variables in pipeline
stages, use the The let variables can be accessed by the
stages in the pipeline, including
additional
| |
Specifies the The The To reference variables in pipeline
stages, use the The let variables can be accessed by the
stages in the pipeline, including
additional
| |
Specifies the name of the new array field to add to the foreign documents. The new array field contains the matching documents from the foreign collection. If the specified name already exists in the foreign document, the existing field is overwritten. |
The operation corresponds to this pseudo-SQL statement:
SELECT *, <output array field> FROM localCollection WHERE <output array field> IN ( SELECT <documents as determined from the pipeline> FROM <foreignCollection> WHERE <foreignCollection.foreignField> = <localCollection.localField> AND <pipeline match condition> );
See this example:
Considerations
Views and Collation
If performing an aggregation that involves multiple views, such as
with $lookup
or $graphLookup
, the views must
have the same collation.
Restrictions
You cannot include the $out
or the $merge
stage in the $lookup
stage. That is, when specifying a
pipeline for the joined collection, you
cannot include either stage in the pipeline
field.
{ $lookup: { from: <collection to join>, let: { <var_1>: <expression>, …, <var_n>: <expression> }, pipeline: [ <pipeline to execute on the joined collection> ], // Cannot include $out or $merge as: <output array field> } }
Sharded Collection Restrictions
In the $lookup
stage, the from
collection cannot be
sharded. However, the collection on which you run the
aggregate()
method can be sharded.
Specifically, in this example:
db.collection.aggregate([ { $lookup: { from: "fromCollection", ... } } ])
The
collection
can be sharded.The
fromCollection
cannot be sharded.
To join a sharded collection with an unsharded collection, run the aggregation on the sharded collection and lookup the unsharded collection. For example:
db.shardedCollection.aggregate([ { $lookup: { from: "unshardedCollection", ... } } ])
Alternatively, or to join multiple sharded collections, consider:
Modifying client applications to perform manual lookups instead of using the
$lookup
aggregation stage.If possible, using an embedded data model that removes the need to join collections.
If possible, using an Atlas Data Lake $lookup pipeline stage to lookup a sharded collection.
Performance Considerations
$lookup
performance depends on the type of operation performed.
Refer to the following table for performance considerations for
different $lookup
operations.
$lookup Operation | Performance Considerations |
---|---|
| |
| |
|
For general performance strategies, see Indexing Strategies and Query Optimization.
Important
Excessive use of $lookup
within a query may slow down
performance. To avoid multiple $lookup
stages, consider an
embedded data model to optimize query
performance.
Examples
Perform a Single Equality Join with $lookup
Create a collection orders
with these documents:
db.orders.insertMany( [ { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 }, { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 }, { "_id" : 3 } ] )
Create another collection inventory
with these documents:
db.inventory.insertMany( [ { "_id" : 1, "sku" : "almonds", "description": "product 1", "instock" : 120 }, { "_id" : 2, "sku" : "bread", "description": "product 2", "instock" : 80 }, { "_id" : 3, "sku" : "cashews", "description": "product 3", "instock" : 60 }, { "_id" : 4, "sku" : "pecans", "description": "product 4", "instock" : 70 }, { "_id" : 5, "sku": null, "description": "Incomplete" }, { "_id" : 6 } ] )
The following aggregation operation on the orders
collection
joins the documents from orders
with the documents from the
inventory
collection using the fields item
from the
orders
collection and the sku
field from the inventory
collection:
db.orders.aggregate( [ { $lookup: { from: "inventory", localField: "item", foreignField: "sku", as: "inventory_docs" } } ] )
The operation returns these documents:
{ "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2, "inventory_docs" : [ { "_id" : 1, "sku" : "almonds", "description" : "product 1", "instock" : 120 } ] } { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1, "inventory_docs" : [ { "_id" : 4, "sku" : "pecans", "description" : "product 4", "instock" : 70 } ] } { "_id" : 3, "inventory_docs" : [ { "_id" : 5, "sku" : null, "description" : "Incomplete" }, { "_id" : 6 } ] }
The operation corresponds to this pseudo-SQL statement:
SELECT *, inventory_docs FROM orders WHERE inventory_docs IN ( SELECT * FROM inventory WHERE sku = orders.item );
For more information, see Equality Match Performance Considerations.
Use $lookup
with an Array
If the localField
is an array, you can match the array elements
against a scalar foreignField
without an $unwind
stage.
For example, create an example collection classes
with these
documents:
db.classes.insertMany( [ { _id: 1, title: "Reading is ...", enrollmentlist: [ "giraffe2", "pandabear", "artie" ], days: ["M", "W", "F"] }, { _id: 2, title: "But Writing ...", enrollmentlist: [ "giraffe1", "artie" ], days: ["T", "F"] } ] )
Create another collection members
with these documents:
db.members.insertMany( [ { _id: 1, name: "artie", joined: new Date("2016-05-01"), status: "A" }, { _id: 2, name: "giraffe", joined: new Date("2017-05-01"), status: "D" }, { _id: 3, name: "giraffe1", joined: new Date("2017-10-01"), status: "A" }, { _id: 4, name: "panda", joined: new Date("2018-10-11"), status: "A" }, { _id: 5, name: "pandabear", joined: new Date("2018-12-01"), status: "A" }, { _id: 6, name: "giraffe2", joined: new Date("2018-12-01"), status: "D" } ] )
The following aggregation operation joins documents in the classes
collection with the members
collection, matching on the
enrollmentlist
field to the name
field:
db.classes.aggregate( [ { $lookup: { from: "members", localField: "enrollmentlist", foreignField: "name", as: "enrollee_info" } } ] )
The operation returns the following:
{ "_id" : 1, "title" : "Reading is ...", "enrollmentlist" : [ "giraffe2", "pandabear", "artie" ], "days" : [ "M", "W", "F" ], "enrollee_info" : [ { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" }, { "_id" : 5, "name" : "pandabear", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "A" }, { "_id" : 6, "name" : "giraffe2", "joined" : ISODate("2018-12-01T00:00:00Z"), "status" : "D" } ] } { "_id" : 2, "title" : "But Writing ...", "enrollmentlist" : [ "giraffe1", "artie" ], "days" : [ "T", "F" ], "enrollee_info" : [ { "_id" : 1, "name" : "artie", "joined" : ISODate("2016-05-01T00:00:00Z"), "status" : "A" }, { "_id" : 3, "name" : "giraffe1", "joined" : ISODate("2017-10-01T00:00:00Z"), "status" : "A" } ] }
Use $lookup
with $mergeObjects
The $mergeObjects
operator combines multiple documents
into a single document.
Create a collection orders
with these documents:
db.orders.insertMany( [ { "_id" : 1, "item" : "almonds", "price" : 12, "quantity" : 2 }, { "_id" : 2, "item" : "pecans", "price" : 20, "quantity" : 1 } ] )
Create another collection items
with these documents:
db.items.insertMany( [ { "_id" : 1, "item" : "almonds", description: "almond clusters", "instock" : 120 }, { "_id" : 2, "item" : "bread", description: "raisin and nut bread", "instock" : 80 }, { "_id" : 3, "item" : "pecans", description: "candied pecans", "instock" : 60 } ] )
The following operation first uses the $lookup
stage to
join the two collections by the item
fields and then uses
$mergeObjects
in the $replaceRoot
to merge
the joined documents from items
and orders
:
db.orders.aggregate( [ { $lookup: { from: "items", localField: "item", // field in the orders collection foreignField: "item", // field in the items collection as: "fromItems" } }, { $replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$fromItems", 0 ] }, "$$ROOT" ] } } }, { $project: { fromItems: 0 } } ] )
The operation returns these documents:
{ _id: 1, item: 'almonds', description: 'almond clusters', instock: 120, price: 12, quantity: 2 }, { _id: 2, item: 'pecans', description: 'candied pecans', instock: 60, price: 20, quantity: 1 }
Perform Multiple Joins and a Correlated Subquery with $lookup
Pipelines can execute on a joined collection and include multiple join conditions.
A join condition can reference a field in the local collection on which
the aggregate()
method was run and reference a
field in the joined collection. This allows a correlated subquery
between the two collections.
MongoDB 5.0 supports concise correlated subqueries.
Create a collection orders
with these documents:
db.orders.insertMany( [ { "_id" : 1, "item" : "almonds", "price" : 12, "ordered" : 2 }, { "_id" : 2, "item" : "pecans", "price" : 20, "ordered" : 1 }, { "_id" : 3, "item" : "cookies", "price" : 10, "ordered" : 60 } ] )
Create another collection warehouses
with these documents:
db.warehouses.insertMany( [ { "_id" : 1, "stock_item" : "almonds", warehouse: "A", "instock" : 120 }, { "_id" : 2, "stock_item" : "pecans", warehouse: "A", "instock" : 80 }, { "_id" : 3, "stock_item" : "almonds", warehouse: "B", "instock" : 60 }, { "_id" : 4, "stock_item" : "cookies", warehouse: "B", "instock" : 40 }, { "_id" : 5, "stock_item" : "cookies", warehouse: "A", "instock" : 80 } ] )
The following example:
Uses a correlated subquery with a join on the
orders.item
andwarehouse.stock_item
fields.Ensures the quantity of the item in stock can fulfill the ordered quantity.
db.orders.aggregate( [ { $lookup: { from: "warehouses", let: { order_item: "$item", order_qty: "$ordered" }, pipeline: [ { $match: { $expr: { $and: [ { $eq: [ "$stock_item", "$$order_item" ] }, { $gte: [ "$instock", "$$order_qty" ] } ] } } }, { $project: { stock_item: 0, _id: 0 } } ], as: "stockdata" } } ] )
The operation returns these documents:
{ _id: 1, item: 'almonds', price: 12, ordered: 2, stockdata: [ { warehouse: 'A', instock: 120 }, { warehouse: 'B', instock: 60 } ] }, { _id: 2, item: 'pecans', price: 20, ordered: 1, stockdata: [ { warehouse: 'A', instock: 80 } ] }, { _id: 3, item: 'cookies', price: 10, ordered: 60, stockdata: [ { warehouse: 'A', instock: 80 } ] }
The operation corresponds to this pseudo-SQL statement:
SELECT *, stockdata FROM orders WHERE stockdata IN ( SELECT warehouse, instock FROM warehouses WHERE stock_item = orders.item AND instock >= orders.ordered );
The $eq
, $lt
, $lte
,
$gt
, and $gte
comparison operators placed in
an $expr
operator can use an index on the from
collection
referenced in a $lookup
stage. Limitations:
Indexes can only be used for comparisons between fields and constants, so the
let
operand must resolve to a constant.For example, a comparison between
$a
and a constant value can use an index, but a comparison between$a
and$b
cannot.Indexes are not used for comparisons where the
let
operand resolves to an empty or missing value.Multikey indexes are not used.
For example, if the index { stock_item: 1, instock: 1 }
exists on
the warehouses
collection:
The equality match on the
warehouses.stock_item
field uses the index.The range part of the query on the
warehouses.instock
field also uses the indexed field in the compound index.
Perform an Uncorrelated Subquery with $lookup
An aggregation pipeline $lookup
stage can execute a pipeline
on the joined collection, which allows uncorrelated subqueries. An
uncorrelated subquery does not reference the joined document fields.
Note
Starting in MongoDB 5.0, for an uncorrelated subquery in a
$lookup
pipeline stage containing a $sample
stage, the $sampleRate
operator, or the
$rand
operator, the subquery is always run again if
repeated. Previously, depending on the subquery output size, either the
subquery output was cached or the subquery was run again.
Create a collection absences
with these documents:
db.absences.insertMany( [ { "_id" : 1, "student" : "Ann Aardvark", sickdays: [ new Date ("2018-05-01"),new Date ("2018-08-23") ] }, { "_id" : 2, "student" : "Zoe Zebra", sickdays: [ new Date ("2018-02-01"),new Date ("2018-05-23") ] }, ] )
Create another collection holidays
with these documents:
db.holidays.insertMany( [ { "_id" : 1, year: 2018, name: "New Years", date: new Date("2018-01-01") }, { "_id" : 2, year: 2018, name: "Pi Day", date: new Date("2018-03-14") }, { "_id" : 3, year: 2018, name: "Ice Cream Day", date: new Date("2018-07-15") }, { "_id" : 4, year: 2017, name: "New Years", date: new Date("2017-01-01") }, { "_id" : 5, year: 2017, name: "Ice Cream Day", date: new Date("2017-07-16") } ] )
The following operation joins the absences
collection with 2018
holiday information from the holidays
collection:
db.absences.aggregate( [ { $lookup: { from: "holidays", pipeline: [ { $match: { year: 2018 } }, { $project: { _id: 0, date: { name: "$name", date: "$date" } } }, { $replaceRoot: { newRoot: "$date" } } ], as: "holidays" } } ] )
The operation returns the following:
{ _id: 1, student: 'Ann Aardvark', sickdays: [ ISODate("2018-05-01T00:00:00.000Z"), ISODate("2018-08-23T00:00:00.000Z") ], holidays: [ { name: 'New Years', date: ISODate("2018-01-01T00:00:00.000Z") }, { name: 'Pi Day', date: ISODate("2018-03-14T00:00:00.000Z") }, { name: 'Ice Cream Day', date: ISODate("2018-07-15T00:00:00.000Z") } ] }, { _id: 2, student: 'Zoe Zebra', sickdays: [ ISODate("2018-02-01T00:00:00.000Z"), ISODate("2018-05-23T00:00:00.000Z") ], holidays: [ { name: 'New Years', date: ISODate("2018-01-01T00:00:00.000Z") }, { name: 'Pi Day', date: ISODate("2018-03-14T00:00:00.000Z") }, { name: 'Ice Cream Day', date: ISODate("2018-07-15T00:00:00.000Z") } ] }
The operation corresponds to this pseudo-SQL statement:
SELECT *, holidays FROM absences WHERE holidays IN ( SELECT name, date FROM holidays WHERE year = 2018 );
For more information, see Uncorrelated Subquery Performance Considerations.
Perform a Concise Correlated Subquery with $lookup
New in version 5.0.
Starting in MongoDB 5.0, an aggregation pipeline $lookup
stage supports a concise correlated subquery syntax that improves joins between
collections. The new concise syntax removes the requirement for an
equality match on the foreign and local fields inside of an
$expr
operator in a $match
stage.
Create a collection restaurants
:
db.restaurants.insertMany( [ { _id: 1, name: "American Steak House", food: [ "filet", "sirloin" ], beverages: [ "beer", "wine" ] }, { _id: 2, name: "Honest John Pizza", food: [ "cheese pizza", "pepperoni pizza" ], beverages: [ "soda" ] } ] )
Create another collection orders
with food and optional drink
orders:
db.orders.insertMany( [ { _id: 1, item: "filet", restaurant_name: "American Steak House" }, { _id: 2, item: "cheese pizza", restaurant_name: "Honest John Pizza", drink: "lemonade" }, { _id: 3, item: "cheese pizza", restaurant_name: "Honest John Pizza", drink: "soda" } ] )
The following example:
Joins the
orders
andrestaurants
collections by matching theorders.restaurant_name
localField with therestaurants.name
foreignField. The match is performed before thepipeline
is run.Performs an
$in
array match between theorders.drink
andrestaurants.beverages
fields that are accessed using$$orders_drink
and$beverages
respectively.
db.orders.aggregate( [ { $lookup: { from: "restaurants", localField: "restaurant_name", foreignField: "name", let: { orders_drink: "$drink" }, pipeline: [ { $match: { $expr: { $in: [ "$$orders_drink", "$beverages" ] } } } ], as: "matches" } } ] )
There is a match for the soda
value in the orders.drink
and
restaurants.beverages
fields. This output shows the matches
array and contains all joined fields from the restaurants
collection
for the match:
{ "_id" : 1, "item" : "filet", "restaurant_name" : "American Steak House", "matches" : [ ] } { "_id" : 2, "item" : "cheese pizza", "restaurant_name" : "Honest John Pizza", "drink" : "lemonade", "matches" : [ ] } { "_id" : 3, "item" : "cheese pizza", "restaurant_name" : "Honest John Pizza", "drink" : "soda", "matches" : [ { "_id" : 2, "name" : "Honest John Pizza", "food" : [ "cheese pizza", "pepperoni pizza" ], "beverages" : [ "soda" ] } ] }
Before the introduction of concise correlated subqueries, you had to use
an $eq
equality match between the local field and the
joined field in the $expr
operator in the pipeline
$lookup
stage as shown in Perform Multiple Joins and a Correlated Subquery with $lookup
.
This example uses the older verbose syntax from MongoDB versions before 5.0 and returns the same results as the previous concise example:
db.orders.aggregate( [ { $lookup: { from: "restaurants", let: { orders_restaurant_name: "$restaurant_name", orders_drink: "$drink" }, pipeline: [ { $match: { $expr: { $and: [ { $eq: [ "$$orders_restaurant_name", "$name" ] }, { $in: [ "$$orders_drink", "$beverages" ] } ] } } } ], as: "matches" } } ] )
The previous examples correspond to this pseudo-SQL statement:
SELECT *, matches FROM orders WHERE matches IN ( SELECT * FROM restaurants WHERE restaurants.name = orders.restaurant_name AND restaurants.beverages = orders.drink );
For more information, see Correlated Subquery Performance Considerations.