$bucket(聚合)
定义
Considerations
$bucket
和内存限制
$bucket
阶段的RAM限制为100 MB。 默认,如果阶段超过此限制, $bucket
会返回错误。 要为阶段处理留出更多空间,请使用 allowDiskUse选项启用聚合管道阶段能够写入临时文件。
语法
{ $bucket: { groupBy: <expression>, boundaries: [ <lowerbound1>, <lowerbound2>, ... ], default: <literal>, output: { <output1>: { <$accumulator expression> }, ... <outputN>: { <$accumulator expression> } } } }
$bucket
文档包含以下字段:
字段 | 类型 | 说明 | |||
---|---|---|---|---|---|
表达式(expression) | 对文件进行分组的表达式。如需指定字段路径,请在字段名称前加上美元符号 除非 | ||||
阵列 | 基于指定每个存储桶边界的 groupBy表达式的值大量。每对相邻的值都充当存储桶的包含下边界和不包含下边界。您必须指定至少两个边界。 指定值必须按升序排列,且类型相同。例外情况是值是混合数字类型,例如:
示例,一个
| ||||
$literal | 可选。指定附加存储桶 如果未指定,每个输入文档必须将
| ||||
文档 | 可选。一份文档,指定除
如果您不指定 如果您指定 |
行为
$bucket
需要至少满足以下条件之一,否则该操作将抛出错误:
如果 groupBy
表达式解析为数组或文档,$bucket
就会使用 $sort
中的比较逻辑将输入文档安排到存储桶中。
示例
按年份划分存储桶,按存储桶结果筛选
在 mongosh
中创建名为 artists
的示例集合,其中包含以下文档:
db.artists.insertMany([ { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" }, { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" }, { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" }, { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" }, { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" }, { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" }, { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" }, { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" } ])
以下操作根据 year_born
字段将文档分组到存储桶,并根据存储桶中的文档计数进行筛选:
db.artists.aggregate( [ // First Stage { $bucket: { groupBy: "$year_born", // Field to group by boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets default: "Other", // Bucket ID for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artists" : { $push: { "name": { $concat: [ "$first_name", " ", "$last_name"] }, "year_born": "$year_born" } } } } }, // Second Stage { $match: { count: {$gt: 3} } } ] )
- 第一个阶段:
$bucket
阶段按year_born
字段将文档分组到存储桶中。存储桶具有以下边界:[1840, 1850) 包含下边界
1840
且不含上边界1850
。[1850, 1860) 包含下边界
1850
且不含上边界1860
。[1860, 1870) 包含下边界
1860
且不含上边界1870
。[1870, 1880) 包含下边界
1870
且不含上边界1880
。如果文档不包含
year_born
字段或其year_born
字段超出上述范围,则会将其置于默认存储桶中,_id
值为"Other"
。
该阶段包括输出文档,用于确定待返回的字段:
字段说明_id
包括存储桶的下边界。
count
存储桶中的文档计数。
artists
包含存储桶中每位艺术家信息的文档数组。每个文档都包含该艺术家的
name
,这是一个接合(即$concat
)艺术家的first_name
和last_name
。year_born
此阶段将以下文件传递到下一阶段:
{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] } { "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 }, { "name" : "Edvard Diriks", "year_born" : 1855 } ] } { "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 }, { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 }, { "name" : "Alfred Maurer", "year_born" : 1868 }, { "name" : "Edvard Munch", "year_born" : 1863 } ] } { "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] } - 第二阶段
$match
阶段筛选前一阶段的输出,仅返回包含 3 个以上文档的存储桶。该操作将返回以下文档:
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 }, { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 }, { "name" : "Alfred Maurer", "year_born" : 1868 }, { "name" : "Edvard Munch", "year_born" : 1863 } ] }
使用带有 $facet 的 $bucket 按多个字段进行存储桶分组
您可以使用 $facet
阶段在单个阶段中执行多个 $bucket
聚合。
在 mongosh
中创建名为 artwork
的示例集合,其中包含以下文档:
db.artwork.insertMany([ { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }, { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }, { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }, { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }, { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }, { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }, { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 /* No price*/ }, { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") } ])
以下操作使用 $facet
阶段中的两个 $bucket
阶段来创建两个分组,一个按照 price
分组,另一个按照 year
分组:
db.artwork.aggregate( [ { $facet: { // Top-level $facet stage "price": [ // Output field 1 { $bucket: { groupBy: "$price", // Field to group by boundaries: [ 0, 200, 400 ], // Boundaries for the buckets default: "Other", // Bucket ID for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artwork" : { $push: { "title": "$title", "price": "$price" } }, "averagePrice": { $avg: "$price" } } } } ], "year": [ // Output field 2 { $bucket: { groupBy: "$year", // Field to group by boundaries: [ 1890, 1910, 1920, 1940 ], // Boundaries for the buckets default: "Unknown", // Bucket ID for documents which do not fall into a bucket output: { // Output for each bucket "count": { $sum: 1 }, "artwork": { $push: { "title": "$title", "year": "$year" } } } } } ] } } ] )
- 第一个分面
第个个分面按
price
对输入文档进行分组。存储桶具有以下边界:[0, 200) 包含下边界
0
且不含上边界200
。[200, 400) 包含下边界
200
且不含上边界400
。“其他”,系指
default
存储桶包含无价格或价格超出上述范围的文档。
字段说明_id
包括存储桶的下边界。
count
存储桶中的文档计数。
artwork
包含存储桶中每件艺术品信息的文档数组。
averagePrice
利用
$avg
操作符来显示桶中所有艺术品的平均价格。- 第二分面
第二个分面按
year
对输入文档进行分组。存储桶具有以下边界:[1890, 1910) 包含下边界
1890
且不含上边界1910
。[1910, 1920) 包含下边界
1910
且不含上边界1920
。[1920, 1940) 包含下边界
1910
且不含上边界1940
。“未知”,系指
default
存储桶包含无年份或年份超出上述范围的文档。
字段说明count
存储桶中的文档计数。
artwork
包含存储桶中每件艺术品信息的文档数组。
- 输出
该操作将返回以下文档:
{ "price" : [ // Output of first facet { "_id" : 0, "count" : 4, "artwork" : [ { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") }, { "title" : "Dancer", "price" : NumberDecimal("76.04") }, { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") }, { "title" : "Blue Flower", "price" : NumberDecimal("118.42") } ], "averagePrice" : NumberDecimal("140.4375") }, { "_id" : 200, "count" : 2, "artwork" : [ { "title" : "Melancholy III", "price" : NumberDecimal("280.00") }, { "title" : "Composition VII", "price" : NumberDecimal("385.00") } ], "averagePrice" : NumberDecimal("332.50") }, { // Includes documents without prices and prices greater than 400 "_id" : "Other", "count" : 2, "artwork" : [ { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") }, { "title" : "The Scream" } ], "averagePrice" : NumberDecimal("483.00") } ], "year" : [ // Output of second facet { "_id" : 1890, "count" : 2, "artwork" : [ { "title" : "Melancholy III", "year" : 1902 }, { "title" : "The Scream", "year" : 1893 } ] }, { "_id" : 1910, "count" : 2, "artwork" : [ { "title" : "Composition VII", "year" : 1913 }, { "title" : "Blue Flower", "year" : 1918 } ] }, { "_id" : 1920, "count" : 3, "artwork" : [ { "title" : "The Pillars of Society", "year" : 1926 }, { "title" : "Dancer", "year" : 1925 }, { "title" : "The Persistence of Memory", "year" : 1931 } ] }, { // Includes documents without a year "_id" : "Unknown", "count" : 1, "artwork" : [ { "title" : "The Great Wave off Kanagawa" } ] } ] }