“文档” 菜单
文档首页
/
MongoDB Manual
/ / /

$regexFindAll(聚合)

在此页面上

  • 定义
  • 语法
  • 行为
  • 举例
$regexFindAll

在聚合表达式中提供正则表达式模式匹配功能。该操作符返回一个包含每个匹配项信息的文档数组。如果未找到匹配项,则返回一个空数组。

MongoDB 使用与 Perl 兼容的正则表达式(即 "PCRE" ) 版本 8.41,支持 UTF-8。

$regexFindAll 操作符的语法如下:

{ $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }
字段
说明
输入

要应用正则表达式模式的字符串。可以是字符串或解析为字符串的任何有效表达式

要应用的正则表达式模式。 可以是解析为字符串或正则表达式模式/<pattern>/的任何有效表达式。使用正则表达式/<pattern>/时,您还可以指定正则表达式选项im (但不能指定sx选项):

  • "pattern"

  • /<pattern>/

  • /<pattern>/<options>

或者,您也可以使用选项字段指定正则表达式选项。要指定 sx 选项,必须使用选项字段。

不能同时在 regexoptions 字段中指定选项。

可选。以下 <options> 可用于正则表达式。

注意

不能同时在 regexoptions 字段中指定选项。

选项
说明
i
大小写不敏感,可同时匹配大写和小写。可以在 options 字段中或作为正则表达式字段的一部分指定该选项。
m

对于包含锚点的模式(即 ^ 表示开头,$ 表示结尾),在每行的开头或结尾匹配具有多行值的字符串。如果没有此选项,这些锚点将匹配字符串的开头或结尾。

如果模式不包含锚点,或者字符串值没有换行符(如 \n),则 m 选项没有任何作用。

x

“扩展”功能将忽略模式中的所有空白字符,除非转义或包含在字符类中。

此外,其还会忽略未转义的哈希/磅 (#) 字符和下一新行(含)之间的字符,因此您可以在复杂的模式中加入注释。这种情况只适用于数据字符;空白字符绝不能出现在模式中的特殊字符序列中。

x 选项不影响对 VT 字符的处理(如代码 11)。

您只能在 options 字段中指定该选项。

s

允许点字符(即 .)匹配所有字符,包括换行符。

您只能在 options 字段中指定该选项。

该操作符返回一个数组:

  • 如果该操作符未找到匹配项,则该操作符返回一个空数组。

  • 如果操作符找到匹配项,则该操作符将返回一个文档数组,其中包含每个匹配项的以下信息:

    [ { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }, ... ]

提示

另请参阅:

$regexFindAll 忽略为集合 db.collection.aggregate() 和索引(如使用)指定的排序规则。

例如,创建一个排序规则强度为 1 的样本集合(即仅比较基本字符,忽略其他差异,例如大小写和变音符号):

db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )

插入以下文档:

db.myColl.insertMany([
{ _id: 1, category: "café" },
{ _id: 2, category: "cafe" },
{ _id: 3, category: "cafE" }
])

使用集合的排序规则,以下操作执行不区分大小写和不区分变音符号的匹配:

db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )

该操作将返回以下 3 个文档:

{ "_id" : 1, "category" : "café" }
{ "_id" : 2, "category" : "cafe" }
{ "_id" : 3, "category" : "cafE" }

但是,聚合表达式 $regexFind 忽略排序规则;换言之,以下正则表达式模式匹配示例区分大小写和变音符号:

db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ] )
db.myColl.aggregate(
[ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ],
{ collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFindAll
)

这两个操作都返回以下内容:

{ "_id" : 1, "category" : "café", "results" : [ ] }
{ "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] }
{ "_id" : 3, "category" : "cafE", "results" : [ ] }

要执行不区分大小写的 regex 模式匹配,请改用 i 选项。有关示例,请参阅 i 选项

如果您的正则表达式模式包含捕获群组,并且该模式在输入中找到匹配项,则结果中的captures数组对应于匹配字符串捕获的群组。捕获组在正则表达式模式中使用未转义的括号()指定。 captures数组的长度等于模式中捕获组的数量,并且数组的顺序与捕获组出现的顺序匹配。

创建一个包含以下文档的样本集合 contacts

db.contacts.insertMany([
{ "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" },
{ "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" },
{ "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" },
{ "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" },
{ "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" }
])

以下管道将正则表达式模式 /(C(ar)*)ol/ 应用于 fname 字段:

db.contacts.aggregate([
{
$project: {
returnObject: {
$regexFindAll: { input: "$fname", regex: /(C(ar)*)ol/ }
}
}
}
])

正则表达式模式找到与 fname 值为 CarolColleen 匹配的项:

{ "_id" : 1, "returnObject" : [ { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } ] }
{ "_id" : 2, "returnObject" : [ ] }
{ "_id" : 3, "returnObject" : [ ] }
{ "_id" : 4, "returnObject" : [ { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } ] }
{ "_id" : 5, "returnObject" : [ ] }

该模式包含捕获群组 (C(ar)*),而捕获群组包含嵌套群组 (ar)captures 数组中的元素与两个捕获群组对应。如果匹配的文档未被群组(如 Colleen 和群组 (ar))捕获,$regexFindAll 会用空占位符代替该群组。

如上个示例所示,captures 数组为每个捕获组包含一个元素(对非捕获使用 null)。以下示例通过将捕获组的逻辑 or 应用到 phone 字段来搜索具有纽约市区号的电话号码。每组代表纽约市的一个区号:

db.contacts.aggregate([
{
$project: {
nycContacts: {
$regexFindAll: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ }
}
}
}
])

对于通过正则表达式模式匹配的文档, captures数组包含匹配的捕获组,并用null替换任何非捕获组:

{ "_id" : 1, "nycContacts" : [ { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } ] }
{ "_id" : 2, "nycContacts" : [ { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } ] }
{ "_id" : 3, "nycContacts" : [ ] }
{ "_id" : 4, "nycContacts" : [ ] }
{ "_id" : 5, "nycContacts" : [ { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } ] }

为了说明该示例中讨论的 $regexFindAll 操作符的行为,使用以下文档创建示例集合 products

db.products.insertMany([
{ _id: 1, description: "Single LINE description." },
{ _id: 2, description: "First lines\nsecond line" },
{ _id: 3, description: "Many spaces before line" },
{ _id: 4, description: "Multiple\nline descriptions" },
{ _id: 5, description: "anchors, links and hyperlinks" },
{ _id: 6, description: "métier work vocation" }
])

默认情况下,$regexFindAll 执行区分大小写的匹配。例如,以下聚合在 description 字段上执行区分大小写$regexFindAll。正则表达式模式 /line/ 不指定任何分组:

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } }
])

该操作返回以下内容:

{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] }
] }
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{
"_id" : 6,
"description" : "métier work vocation",
"returnObject" : [ ]
}

以下正则表达式模式 /lin(e|k)/ 在模式中指定分组 (e|k)

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } }
])

该操作返回以下内容:

{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject": [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{
"_id" : 6,
"description" : "métier work vocation",
"returnObject" : [ ]
}

在返回选项中,idx 字段是 代码点 索引而不是字节索引。为了进行说明,请考虑以下使用正则表达式模式/tier/ 的示例:

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } }
])

该操作会返回以下内容,其中只有最后一条记录与模式匹配,并且返回的 idx2(如果使用字节索引,则返回 3)

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] }
{ "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] }
{ "_id" : 6, "description" : "métier work vocation",
"returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }

注意

不能同时在 regexoptions 字段中指定选项。

要执行不区分大小写的模式匹配,将 i 选项作为正则表达式字段的一部分或纳入选项字段:

// Specify i as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/i } }
// Specify i in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "i" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "i" } }

例如,以下聚合在 description 字段上执行不区分大小写$regexFindAll。正则表达式模式 /line/ 不指定任何分组:

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } }
])

该操作将返回以下文档:

{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

注意

不能同时在 regexoptions 字段中指定选项。

要匹配多行字符串中每一行的指定锚点(如 ^$ ),请在 regex 字段或选项字段中包含 m 选项:

// Specify m as part of the regex field
{ $regexFindAll: { input: "$description", regex: /line/m } }
// Specify m in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "m" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "m" } }

以下示例同时包含 im 选项,用于为多行字符串匹配以字母 sS 开头的行:

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } }
])

该操作返回以下内容:

{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

注意

不能同时在 regexoptions 字段中指定选项。

要忽略模式中所有未转义的空格字符和注释(由未转义的哈希 # 字符和下一个换行符表示),请在选项字段中包含 s 选项:

// Specify x in the options field
{ $regexFindAll: { input: "$description", regex: /line/, options: "x" } }
{ $regexFindAll: { input: "$description", regex: "line", options: "x" } }

以下示例纳入 x 选项来跳过非转义空格和注释:

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
])

该操作返回以下内容:

{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

注意

不能同时在 regexoptions 字段中指定选项。

要支持模式中的点字符(即 .)匹配包括换行符在内的所有字符,请在选项字段中加入 s 选项:

// Specify s in the options field
{ $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } }
{ $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }

下面的示例包含 s 选项,允许使用点字符(即“.”)来匹配包括新行在内的所有字符,以及使用 i 选项来执行不区分大小写的匹配:

db.products.aggregate([
{ $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si" } } } }
])

该操作返回以下内容:

{
"_id" : 1,
"description" : "Single LINE description.",
"returnObject" : [ ]
}
{
"_id" : 2,
"description" : "First lines\nsecond line",
"returnObject" : [ ]
}
{
"_id" : 3,
"description" : "Many spaces before line",
"returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ]
}
{
"_id" : 4,
"description" : "Multiple\nline descriptions",
"returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ]
}
{
"_id" : 5,
"description" : "anchors, links and hyperlinks",
"returnObject" : [ ]
}
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }

使用以下文档创建样本collectionfeedback

db.feedback.insertMany([
{ "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
{ "_id" : 2, comment: "I wanted to concatenate a string" },
{ "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
{ "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
])

以下聚合使用$regexFindAllcomment字段中提取所有电子邮件(不区分大小写)。

db.feedback.aggregate( [
{ $addFields: {
"email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
} },
{ $set: { email: "$email.match"} }
] )
第一个阶段:

该阶段使用$addFields阶段向文档添加新字段email 。新字段是一个数组,其中包含对comment字段执行$regexFindAll的结果:

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] }
第二阶段

此阶段使用 $set 阶段将 email数组元素重置为 "email.match" 值。如果 email 的当前值为 null,则会将 email 的新值设为 null。

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] }
{ "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] }
{ "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }

使用以下文档创建样本collectionfeedback

db.feedback.insertMany([
{ "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" },
{ "_id" : 2, comment: "I wanted to concatenate a string" },
{ "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" },
{ "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" }
])

要回复反馈,假设您想要解析电子邮件地址的本地部分以用作问候语中的名称。使用 $regexFindAll 结果中返回的 captured 字段,您可以解析出每个电子邮件地址的本地部分:

db.feedback.aggregate( [
{ $addFields: {
"names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } },
} },
{ $set: { names: { $reduce: { input: "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } }
] )
第一个阶段:

该阶段使用 $addFields 阶段向文档添加新字段 names。新字段包含在 comment 字段上执行 $regexFindAll 的结果:

{
"_id" : 1,
"comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
"names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
"_id" : 3,
"comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
"names" : [
{ "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] },
{ "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] }
]
}
{
"_id" : 4,
"comment" : "It's just me. I'm testing. fred@MongoDB.com",
"names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ]
}
第二阶段

该阶段使用 $set 阶段和 $reduce 操作符,将 names 重置为包含 "$names.captures" 元素的数组。

{
"_id" : 1,
"comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com",
"names" : [ "aunt.arc.tica" ]
}
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] }
{
"_id" : 3,
"comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com",
"names" : [ "cam", "c.dia" ]
}
{
"_id" : 4,
"comment" : "It's just me. I'm testing. fred@MongoDB.com",
"names" : [ "fred" ]
}

提示

另请参阅:

有关captures数组行为和其他示例的详情,请参阅captures输出行为。

← $regexFind(聚合)