$substrBytes（聚合）

在此页面上

定义
行为
例子

此版本的文档已存档，不再提供支持。要升级5.0部署，请参阅 MongoDB 6.0升级程序。

定义

$substrBytes

版本 3.4 中的新增功能。

返回字符串的子字符串。子字符串从字符串中指定的 UTF-8 字节索引（从零开始）处的字符开始，并持续指定的字节数。

$substrBytes具有以下操作符表达式语法：

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }

字段	类型	说明
`string expression`	字符串	要从中提取子字符串的字符串。 `string expression`可以是任何有效的表达式，只要它解析为字符串即可。有关表达式的更多信息，请参阅表达式。如果参数解析为`null` 的值或引用了缺失的字段，则`$substrBytes` 会返回空字符串。如果参数未解析为字符串或`null` ，也未引用缺失字段，则`$substrBytes` 将返回错误。
`byte index`	数字	指示子字符串的点。`byte index`可以是任何有效表达式，只要它解析为非负整数或可以表示为整数的数字（例如 2.0）。 `byte index` 不能引用位于多字节 UTF-8 字符中间的起始索引。
`byte count`	数字	可以是任何有效的表达式，只要它解析为非负整数或可以表示为整数的数字（例如 2.0）。 `byte count` 不能产生位于 UTF-8 字符中间的结束索引。

行为

$substrBytes 操作符使用 UTF-8 编码字节的索引，其中每个码位或字符可能使用一至四个字节进行编码。

示例，US-ASCII 字符使用一个字节进行编码。带有变音符号的字符和其他拉丁字母字符（英语字母表之外的拉丁字符）使用两个字节进行编码。中文、日文和韩文字符通常需要三个字节，而其他平面的 unicode（表情符号、数学符号等）需要四个字节。

请务必注意string expression中的内容，因为在 UTF-8 字符中间提供byte index或byte count会导致错误。

$substrBytes 与 $substrCP 的区别在于 $substrBytes 计算每个字符的字节数，而 $substrCP 计算代码点或字符数，无论一个字符使用多少字节。

例子

结果

{ $substrBytes: [ "abcde", 1, 2 ] }

"bc"

{ $substrBytes: [ "Hello World!", 6, 5 ] }

 "World"

{ $substrBytes: [ "cafétéria", 0, 5 ] }

"café"

{ $substrBytes: [ "cafétéria", 5, 4 ] }

"tér"

{ $substrBytes: [ "cafétéria", 7, 3 ] }

错误消息：

"Error: Invalid range, starting index is a UTF-8 continuation byte."

{ $substrBytes: [ "cafétéria", 3, 1 ] }

错误消息：

"Error: Invalid range, ending index is in the middle of a UTF-8 character."

例子

单字节字符集

考虑包含以下文档的 inventory 集合：

{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }

以下操作使用 $substrBytes c操作符将 quarter 值（仅包含单字节 US-ASCII 字符）分为 yearSubstring 和 quarterSubstring。quarterSubstring 字段表示 yearSubstring 后面指定的 byte index 中的字符串的其余部分。它是通过使用 $strLenBytes 从字符串长度中减去 byte index 来计算的。

db.inventory.aggregate(
  [
    {
      $project: {
        item: 1,
        yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
        quarterSubtring: {
          $substrBytes: [
            "$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
          ]
        }
      }
    }
  ]
)

操作返回以下结果：

{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
{ "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
{ "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

单字节和多字节字符集

使用以下文档创建 food 集合：

db.food.insertMany(
 [
    { "_id" : 1, "name" : "apple" },
    { "_id" : 2, "name" : "banana" },
    { "_id" : 3, "name" : "éclair" },
    { "_id" : 4, "name" : "hamburger" },
    { "_id" : 5, "name" : "jalapeño" },
    { "_id" : 6, "name" : "pizza" },
    { "_id" : 7, "name" : "tacos" },
    { "_id" : 8, "name" : "寿司sushi" }
 ]
)

以下操作使用$substrBytes操作符从name值创建一个三字节的menuCode ：

db.food.aggregate(
  [
    {
      $project: {
        "name": 1,
        "menuCode": { $substrBytes: [ "$name", 0, 3 ] }
      }
    }
  ]
)

操作返回以下结果：

{ "_id" : 1, "name" : "apple", "menuCode" : "app" }
{ "_id" : 2, "name" : "banana", "menuCode" : "ban" }
{ "_id" : 3, "name" : "éclair", "menuCode" : "éc" }
{ "_id" : 4, "name" : "hamburger", "menuCode" : "ham" }
{ "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" }
{ "_id" : 6, "name" : "pizza", "menuCode" : "piz" }
{ "_id" : 7, "name" : "tacos", "menuCode" : "tac" }
{ "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿" }

另请参阅：

$substrCP

后退

$substr

来年

$substrCP