$bucket（聚合）

在此页面上

定义

Considerations
语法
行为
示例

定义

$bucket

根据指定的表达式和存储桶边界将传入的文档分为多个组（被称为桶），并为每个桶输出一个文档。每个输出的文档都包含一个 _id 字段，其值指定了桶边界范围的包含下限。输出选项指定了每个输出文档中包含的字段。

$bucket 仅为包含至少一个输入文档的存储桶生成输出文档。

Considerations

`$bucket` 和内存限制

$bucket阶段的RAM限制为100 MB。默认，如果阶段超过此限制， $bucket会返回错误。要为阶段处理留出更多空间，请使用 allowDiskUse选项启用聚合管道阶段能够写入临时文件。

提示

另请参阅：

聚合管道限制

语法

{
  $bucket: {
      groupBy: <expression>,
      boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
      default: <literal>,
      output: {
         <output1>: { <$accumulator expression> },
         ...
         <outputN>: { <$accumulator expression> }
      }
   }
}

$bucket 文档包含以下字段：

字段

类型

说明

分组方式：

表达式(expression)

对文件进行分组的表达式。如需指定字段路径，请在字段名称前加上美元符号 $，并用引号括起来。

除非$bucket 包含默认规范，否则每个输入文档都必须将groupBy 字段路径（Field Path）或表达式解析为边界指定范围内的值。

边界

阵列

基于指定每个存储桶边界的 groupBy表达式的值大量。每对相邻的值都充当存储桶的包含下边界和不包含下边界。您必须指定至少两个边界。

指定值必须按升序排列，且类型相同。例外情况是值是混合数字类型，例如：

[ 10, NumberLong(20), NumberInt(30) ]

示例，一个 [ 0, 5, 10 ]大量会创建两个存储桶：

[5, 10) 包含下边界 0 且不含上边界 5。
[5, 10) 包含下边界 5 且不含上边界 10。

访问

$literal

可选。指定附加存储桶_id 的字面量，该存储桶包含 groupBy表达式结果不属于边界指定的存储桶的所有文档。

如果未指定，每个输入文档必须将 groupBy 表达式解析为 boundaries 指定的存储桶范围内的值，否则操作会引发错误。

default 值必须小于最低的 boundaries 值，或者不小于最高的 boundaries 值。

default 值的类型可以与 boundaries 中的条目不同。

输出

文档

可选。一份文档，指定除 _id 字段之外要包含在输出文档中的字段。必须使用累加器表达式指定要包含的字段。

<outputfield1>: { <accumulator>: <expression1> },
...
<outputfieldN>: { <accumulator>: <expressionN> }

如果您不指定 output 文档，该操作将返回 count 字段，其中包含每个存储桶中的文档数。

如果您指定 output 文档，则仅返回文档中指定的字段；即，除非显式将 count 字段包含在 output 文档中，否则不会返回该字段。

行为

$bucket 需要至少满足以下条件之一，否则该操作将抛出错误：

每个输入文档都将 groupBy 表达式解析为边界指定的一个存储桶范围内的值，或者
为存储桶文档指定默认值，这些文档的 groupBy 值在 boundaries 之外或与 boundaries 中的值属于不同的 BSON 类型。

如果 groupBy 表达式解析为数组或文档，$bucket 就会使用 $sort 中的比较逻辑将输入文档安排到存储桶中。

示例

按年份划分存储桶，按存储桶结果筛选

在 mongosh 中创建名为 artists 的示例集合，其中包含以下文档：

db.artists.insertMany([
  { "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },
  { "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },
  { "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },
  { "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },
  { "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },
  { "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },
  { "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },
  { "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }
])

以下操作根据 year_born 字段将文档分组到存储桶，并根据存储桶中的文档计数进行筛选：

db.artists.aggregate( [
  // First Stage
  {
    $bucket: {
      groupBy: "$year_born",                        // Field to group by
      boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // Boundaries for the buckets
      default: "Other",                             // Bucket ID for documents which do not fall into a bucket
      output: {                                     // Output for each bucket
        "count": { $sum: 1 },
        "artists" :
          {
            $push: {
              "name": { $concat: [ "$first_name", " ", "$last_name"] },
              "year_born": "$year_born"
            }
          }
      }
    }
  },
  // Second Stage
  {
    $match: { count: {$gt: 3} }
  }
] )

第一个阶段：

$bucket 阶段按 year_born 字段将文档分组到存储桶中。存储桶具有以下边界：

[1840, 1850) 包含下边界 1840 且不含上边界 1850。
[1850, 1860) 包含下边界 1850 且不含上边界 1860。
[1860, 1870) 包含下边界 1860 且不含上边界 1870。
[1870, 1880) 包含下边界 1870 且不含上边界 1880。
如果文档不包含 year_born 字段或其 year_born 字段超出上述范围，则会将其置于默认存储桶中，_id 值为 "Other"。

该阶段包括输出文档，用于确定待返回的字段：

字段	说明
`_id`	包括存储桶的下边界。
`count`	存储桶中的文档计数。
`artists`	包含存储桶中每位艺术家信息的文档数组。每个文档都包含该艺术家的 `name`，这是一个接合（即 `$concat`）艺术家的 `first_name` 和 `last_name`。 `year_born`

此阶段将以下文件传递到下一阶段：

{ "_id" : 1840, "count" : 1, "artists" : [ { "name" : "Odilon Redon", "year_born" : 1840 } ] }
{ "_id" : 1850, "count" : 2, "artists" : [ { "name" : "Vincent Van Gogh", "year_born" : 1853 },
                                           { "name" : "Edvard Diriks", "year_born" : 1855 } ] }
{ "_id" : 1860, "count" : 4, "artists" : [ { "name" : "Emil Bernard", "year_born" : 1868 },
                                           { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
                                           { "name" : "Alfred Maurer", "year_born" : 1868 },
                                           { "name" : "Edvard Munch", "year_born" : 1863 } ] }
{ "_id" : 1870, "count" : 1, "artists" : [ { "name" : "Anna Ostroumova", "year_born" : 1871 } ] }

第二阶段

$match 阶段筛选前一阶段的输出，仅返回包含 3 个以上文档的存储桶。

该操作将返回以下文档：

{ "_id" : 1860, "count" : 4, "artists" :
  [
    { "name" : "Emil Bernard", "year_born" : 1868 },
    { "name" : "Joszef Rippl-Ronai", "year_born" : 1861 },
    { "name" : "Alfred Maurer", "year_born" : 1868 },
    { "name" : "Edvard Munch", "year_born" : 1863 }
  ]
}

使用带有 $facet 的 $bucket 按多个字段进行存储桶分组

您可以使用 $facet 阶段在单个阶段中执行多个 $bucket 聚合。

在 mongosh 中创建名为 artwork 的示例集合，其中包含以下文档：

db.artwork.insertMany([
  { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
      "price" : NumberDecimal("199.99") },
  { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
      "price" : NumberDecimal("280.00") },
  { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
      "price" : NumberDecimal("76.04") },
  { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
      "price" : NumberDecimal("167.30") },
  { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
      "price" : NumberDecimal("483.00") },
  { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
      "price" : NumberDecimal("385.00") },
  { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893
      /* No price*/ },
  { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
      "price" : NumberDecimal("118.42") }
])

以下操作使用 $facet 阶段中的两个 $bucket 阶段来创建两个分组，一个按照 price 分组，另一个按照 year 分组：

db.artwork.aggregate( [
  {
    $facet: {                               // Top-level $facet stage
      "price": [                            // Output field 1
        {
          $bucket: {
              groupBy: "$price",            // Field to group by
              boundaries: [ 0, 200, 400 ],  // Boundaries for the buckets
              default: "Other",             // Bucket ID for documents which do not fall into a bucket
              output: {                     // Output for each bucket
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } },
                "averagePrice": { $avg: "$price" }
              }
          }
        }
      ],
      "year": [                                      // Output field 2
        {
          $bucket: {
            groupBy: "$year",                        // Field to group by
            boundaries: [ 1890, 1910, 1920, 1940 ],  // Boundaries for the buckets
            default: "Unknown",                      // Bucket ID for documents which do not fall into a bucket
            output: {                                // Output for each bucket
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

第一个分面

第个个分面按 price 对输入文档进行分组。存储桶具有以下边界：

[0, 200) 包含下边界 0 且不含上边界 200。
[200, 400) 包含下边界 200 且不含上边界 400。
“其他”，系指 default 存储桶包含无价格或价格超出上述范围的文档。

$bucket 阶段包括输出文档，用于确定要返回的字段：

字段	说明
`_id`	包括存储桶的下边界。
`count`	存储桶中的文档计数。
`artwork`	包含存储桶中每件艺术品信息的文档数组。
`averagePrice`	利用 `$avg` 操作符来显示桶中所有艺术品的平均价格。

第二分面

第二个分面按 year 对输入文档进行分组。存储桶具有以下边界：

[1890, 1910) 包含下边界 1890 且不含上边界 1910。
[1910, 1920) 包含下边界 1910 且不含上边界 1920。
[1920, 1940) 包含下边界 1910 且不含上边界 1940。
“未知”，系指 default 存储桶包含无年份或年份超出上述范围的文档。

$bucket 阶段包括输出文档，用于确定要返回的字段：

字段	说明
`count`	存储桶中的文档计数。
`artwork`	包含存储桶中每件艺术品信息的文档数组。

输出

该操作将返回以下文档：

{
  "price" : [ // Output of first facet
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        { "title" : "The Pillars of Society", "price" : NumberDecimal("199.99") },
        { "title" : "Dancer", "price" : NumberDecimal("76.04") },
        { "title" : "The Great Wave off Kanagawa", "price" : NumberDecimal("167.30") },
        { "title" : "Blue Flower", "price" : NumberDecimal("118.42") }
      ],
      "averagePrice" : NumberDecimal("140.4375")
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "price" : NumberDecimal("280.00") },
        { "title" : "Composition VII", "price" : NumberDecimal("385.00") }
      ],
      "averagePrice" : NumberDecimal("332.50")
    },
    {
      // Includes documents without prices and prices greater than 400
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        { "title" : "The Persistence of Memory", "price" : NumberDecimal("483.00") },
        { "title" : "The Scream" }
      ],
      "averagePrice" : NumberDecimal("483.00")
    }
  ],
  "year" : [ // Output of second facet
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        { "title" : "Melancholy III", "year" : 1902 },
        { "title" : "The Scream", "year" : 1893 }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        { "title" : "Composition VII", "year" : 1913 },
        { "title" : "Blue Flower", "year" : 1918 }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        { "title" : "The Pillars of Society", "year" : 1926 },
        { "title" : "Dancer", "year" : 1925 },
        { "title" : "The Persistence of Memory", "year" : 1931 }
      ]
    },
    {
      // Includes documents without a year
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        { "title" : "The Great Wave off Kanagawa" }
      ]
    }
  ]
}

提示

另请参阅：

$bucketAuto

后退

$addFields

来年

$bucketAuto

定义

Considerations

$bucket 和内存限制

提示

另请参阅：

语法

行为

示例

按年份划分存储桶，按存储桶结果筛选

使用带有 $facet 的 $bucket 按多个字段进行存储桶分组

提示

另请参阅：

`$bucket` 和内存限制