Docs 菜单

$substrBytes (aggregation)

在此页面上

$substrBytes

版本 3.4 中的新增功能

Returns the substring of a string. The substring starts with the character at the specified UTF-8 byte index (zero-based) in the string and continues for the number of bytes specified.

$substrBytes 具有以下操作符表达式语法

{ $substrBytes: [ <string expression>, <byte index>, <byte count> ] }
字段
类型
说明

string expression

字符串

要从中提取子字符串的字符串。 string expression可以是任何有效的表达式,只要它解析为字符串即可。 有关表达式的更多信息,请参阅表达式。

如果参数解析为null 的值或引用了缺失的字段,则$substrBytes 会返回空字符串。

如果参数未解析为字符串或null ,也未引用缺失字段,则$substrBytes 将返回错误。

byte index

数字

Indicates the starting point of the substring. byte index can be any valid 表达式(expression) as long as it resolves to a non-negative integer or number that can be represented as an integer (such as 2.0).

byte index cannot refer to a starting index located in the middle of a multi-byte UTF-8 character.

byte count

数字

可以是任何有效的表达式,只要它解析为非负整数或可以表示为整数的数字(例如 2.0)。

byte count can not result in an ending index that is in the middle of a UTF-8 character.

The $substrBytes operator uses the indexes of UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode.

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters (Latin characters outside of the English alphabet) are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of unicode (emoji, mathematical symbols, etc.) require four bytes.

It is important to be mindful of the content in the string expression because providing a byte index or byte count located in the middle of a UTF-8 character will result in an error.

$substrBytes differs from $substrCP in that $substrBytes counts the bytes of each character, whereas $substrCP counts the code points, or characters, regardless of how many bytes a character uses.

例子
结果
{ $substrBytes: [ "abcde", 1, 2 ] }
"bc"
{ $substrBytes: [ "Hello World!", 6, 5 ] }
"World"
{ $substrBytes: [ "cafétéria", 0, 5 ] }
"café"
{ $substrBytes: [ "cafétéria", 5, 4 ] }
"tér"
{ $substrBytes: [ "cafétéria", 7, 3 ] }

错误消息:

"Error: Invalid range, starting index is a UTF-8 continuation byte."

{ $substrBytes: [ "cafétéria", 3, 1 ] }

错误消息:

"Error: Invalid range, ending index is in the middle of a UTF-8 character."

考虑包含以下文档的 inventory 集合:

{ "_id" : 1, "item" : "ABC1", quarter: "13Q1", "description" : "product 1" }
{ "_id" : 2, "item" : "ABC2", quarter: "13Q4", "description" : "product 2" }
{ "_id" : 3, "item" : "XYZ1", quarter: "14Q2", "description" : null }

The following operation uses the $substrBytes operator separate the quarter value (containing only single byte US-ASCII characters) into a yearSubstring and a quarterSubstring. The quarterSubstring field represents the rest of the string from the specified byte index following the yearSubstring. It is calculated by subtracting the byte index from the length of the string using $strLenBytes.

db.inventory.aggregate(
[
{
$project: {
item: 1,
yearSubstring: { $substrBytes: [ "$quarter", 0, 2 ] },
quarterSubtring: {
$substrBytes: [
"$quarter", 2, { $subtract: [ { $strLenBytes: "$quarter" }, 2 ] }
]
}
}
}
]
)

操作返回以下结果:

{ "_id" : 1, "item" : "ABC1", "yearSubstring" : "13", "quarterSubtring" : "Q1" }
{ "_id" : 2, "item" : "ABC2", "yearSubstring" : "13", "quarterSubtring" : "Q4" }
{ "_id" : 3, "item" : "XYZ1", "yearSubstring" : "14", "quarterSubtring" : "Q2" }

使用以下文档创建 food 集合:

db.food.insertMany(
[
{ "_id" : 1, "name" : "apple" },
{ "_id" : 2, "name" : "banana" },
{ "_id" : 3, "name" : "éclair" },
{ "_id" : 4, "name" : "hamburger" },
{ "_id" : 5, "name" : "jalapeño" },
{ "_id" : 6, "name" : "pizza" },
{ "_id" : 7, "name" : "tacos" },
{ "_id" : 8, "name" : "寿司sushi" }
]
)

The following operation uses the $substrBytes operator to create a three byte menuCode from the name value:

db.food.aggregate(
[
{
$project: {
"name": 1,
"menuCode": { $substrBytes: [ "$name", 0, 3 ] }
}
}
]
)

操作返回以下结果:

{ "_id" : 1, "name" : "apple", "menuCode" : "app" }
{ "_id" : 2, "name" : "banana", "menuCode" : "ban" }
{ "_id" : 3, "name" : "éclair", "menuCode" : "éc" }
{ "_id" : 4, "name" : "hamburger", "menuCode" : "ham" }
{ "_id" : 5, "name" : "jalapeño", "menuCode" : "jal" }
{ "_id" : 6, "name" : "pizza", "menuCode" : "piz" }
{ "_id" : 7, "name" : "tacos", "menuCode" : "tac" }
{ "_id" : 8, "name" : "寿司sushi", "menuCode" : "寿" }

提示

另请参阅:

在此页面上