$regexFind(聚合)
定义
语法
$regexFind
操作符的语法如下:
{ $regexFind: { input: <expression> , regex: <expression>, options: <expression> } }
Operator 徽标
字段 | 说明 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
可选。以下 不能同时在
|
返回:
该操作符如果未找到匹配项,则其结果为 null
。
如果操作符找到匹配项,则操作符的结果是一个包含以下内容的文档:
{ "match" : <string>, "idx" : <num>, "captures" : <array of strings> }
行为
$regexFind
和排序规则
$regexFind
忽略为集合 db.collection.aggregate()
和索引(如使用)指定的排序规则。
例如,创建一个排序规则强度为 1
的样本集合(即仅比较基本字符,忽略其他差异,例如大小写和变音符号):
db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )
插入以下文档:
db.myColl.insertMany([ { _id: 1, category: "café" }, { _id: 2, category: "cafe" }, { _id: 3, category: "cafE" } ])
使用集合的排序规则,以下操作执行不区分大小写和不区分变音符号的匹配:
db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )
该操作将返回以下 3 个文档:
{ "_id" : 1, "category" : "café" } { "_id" : 2, "category" : "cafe" } { "_id" : 3, "category" : "cafE" }
但是,聚合表达式 $regexFind
忽略排序规则;换言之,以下正则表达式模式匹配示例区分大小写和变音符号:
db.myColl.aggregate( [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ } } } } ] ) db.myColl.aggregate( [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ } } } } ], { collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFind )
这两个操作都返回以下内容:
{ "_id" : 1, "category" : "café", "resultObject" : null } { "_id" : 2, "category" : "cafe", "resultObject" : { "match" : "cafe", "idx" : 0, "captures" : [ ] } } { "_id" : 3, "category" : "cafE", "resultObject" : null }
captures
输出行为
如果您的正则表达式模式包含捕获群组,并且该模式在输入中找到匹配项,则结果中的 captures
大量对应于匹配string捕获的群组。 捕获组在正则表达式模式中使用未转义的括号()
指定。 captures
大量的长度等于模式中捕获组的数量,并且大量的顺序与捕获组出现的顺序匹配。
创建一个包含以下文档的样本集合 contacts
:
db.contacts.insertMany([ { "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" }, { "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" }, { "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" }, { "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" }, { "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" } ])
以下管道将正则表达式模式 /(C(ar)*)ol/
应用于 fname
字段:
db.contacts.aggregate([ { $project: { returnObject: { $regexFind: { input: "$fname", regex: /(C(ar)*)ol/ } } } } ])
正则表达式模式找到与 fname
值为 Carol
和 Colleen
匹配的项:
{ "_id" : 1, "returnObject" : { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } } { "_id" : 2, "returnObject" : null } { "_id" : 3, "returnObject" : null } { "_id" : 4, "returnObject" : { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } } { "_id" : 5, "returnObject" : null }
该模式包含捕获群组 (C(ar)*)
,而捕获群组包含嵌套群组 (ar)
。captures
数组中的元素与两个捕获群组对应。如果匹配的文档未被群组(如 Colleen
和群组 (ar)
)捕获,$regexFind
会用空占位符代替该群组。
如上个示例所示,captures
数组为每个捕获组包含一个元素(对非捕获使用 null
)。以下示例通过将捕获组的逻辑 or
应用到 phone
字段来搜索具有纽约市区号的电话号码。每组代表纽约市的一个区号:
db.contacts.aggregate([ { $project: { nycContacts: { $regexFind: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ } } } } ])
对于通过正则表达式模式匹配的文档, captures
大量包含匹配的捕获群组,并用null
替换任何非捕获组:
{ "_id" : 1, "nycContacts" : { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } } { "_id" : 2, "nycContacts" : { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } } { "_id" : 3, "nycContacts" : null } { "_id" : 4, "nycContacts" : null } { "_id" : 5, "nycContacts" : { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } }
示例
$regexFind
及其选项
为了说明该示例中讨论的 $regexFind
操作符的行为,使用以下文档创建示例集合 products
:
db.products.insertMany([ { _id: 1, description: "Single LINE description." }, { _id: 2, description: "First lines\nsecond line" }, { _id: 3, description: "Many spaces before line" }, { _id: 4, description: "Multiple\nline descriptions" }, { _id: 5, description: "anchors, links and hyperlinks" }, { _id: 6, description: "métier work vocation" } ])
默认情况下,$regexFind
执行区分大小写的匹配。例如,以下聚合在 description
字段上执行区分大小写的 $regexFind
。正则表达式模式 /line/
不指定任何分组:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /line/ } } } } ])
该操作返回以下内容:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ ] } } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ ] } } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ ] } } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null } { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
以下正则表达式模式 /lin(e|k)/
在模式中指定分组 (e|k)
:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /lin(e|k)/ } } } } ])
该操作返回以下内容:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ "e" ] } } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ "e" ] } } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ "e" ] } } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : { "match" : "link", "idx" : 9, "captures" : [ "k" ] } } { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
在返回选项中,idx
字段是 代码点 索引,而不是字节索引。为了进行说明,请考虑以下使用 regex 模式 /tier/
的示例:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /tier/ } } } } ])
该操作会返回以下内容,其中只有最后一条记录与模式匹配,并且返回的 idx
为 2
(如果使用字节索引,则返回 3)
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : null } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : null } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : null } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null } { "_id" : 6, "description" : "métier work vocation", "returnObject" : { "match" : "tier", "idx" : 2, "captures" : [ ] } }
i
选项
注意
不能同时在 regex
和 options
字段中指定选项。
要执行不区分大小写的模式匹配,将 i 选项作为正则表达式字段的一部分或纳入选项字段:
// Specify i as part of the regex field { $regexFind: { input: "$description", regex: /line/i } } // Specify i in the options field { $regexFind: { input: "$description", regex: /line/, options: "i" } } { $regexFind: { input: "$description", regex: "line", options: "i" } }
例如,以下聚合在 description
字段上执行不区分大小写的 $regexFind
。正则表达式模式 /line/
不指定任何分组:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /line/i } } } } ])
该操作将返回以下文档:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : { "match" : "LINE", "idx" : 7, "captures" : [ ] } } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ ] } } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ ] } } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ ] } } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null } { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
m
选项
注意
不能同时在 regex
和 options
字段中指定选项。
要匹配多行字符串中每一行的指定锚点(如 ^
、$
),请在 regex 字段或选项字段中包含 m 选项:
// Specify m as part of the regex field { $regexFind: { input: "$description", regex: /line/m } } // Specify m in the options field { $regexFind: { input: "$description", regex: /line/, options: "m" } } { $regexFind: { input: "$description", regex: "line", options: "m" } }
以下示例同时包含 i
和 m
选项,用于为多行字符串匹配以字母 s
或 S
开头的行:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /^s/im } } } } ])
该操作返回以下内容:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : { "match" : "S", "idx" : 0, "captures" : [ ] } } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "s", "idx" : 12, "captures" : [ ] } } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : null } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : null } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null } { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
x
选项
注意
不能同时在 regex
和 options
字段中指定选项。
要忽略模式中所有未转义的空格字符和注释(由未转义的哈希 #
字符和下一个换行符表示),请在选项字段中包含 s 选项:
// Specify x in the options field { $regexFind: { input: "$description", regex: /line/, options: "x" } } { $regexFind: { input: "$description", regex: "line", options: "x" } }
以下示例纳入 x
选项来跳过非转义空格和注释:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } } ])
该操作返回以下内容:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ "e" ] } } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ "e" ] } } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ "e" ] } } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : { "match" : "link", "idx" : 9, "captures" : [ "k" ] } } { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
s
选项
注意
不能同时在 regex
和 options
字段中指定选项。
要支持模式中的点字符(即 .
)匹配包括换行符在内的所有字符,请在选项字段中加入 s 选项:
// Specify s in the options field { $regexFind: { input: "$description", regex: /m.*line/, options: "s" } } { $regexFind: { input: "$description", regex: "m.*line", options: "s" } }
下面的示例包含 s
选项,允许使用点字符(即“.”)来匹配包括新行在内的所有字符,以及使用 i
选项来执行不区分大小写的匹配:
db.products.aggregate([ { $addFields: { returnObject: { $regexFind: { input: "$description", regex:/m.*line/, options: "si" } } } } ])
该操作返回以下内容:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : null } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null } { "_id" : 6, "description" : "métier work vocation", "returnObject" : null }
使用$regexFind
从字符串中解析电子邮件
使用以下文档创建样本collectionfeedback
:
db.feedback.insertMany([ { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" }, { "_id" : 2, comment: "I wanted to concatenate a string" }, { "_id" : 3, comment: "How do I convert a date to string? cam@mongodb.com" }, { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" } ])
以下聚合使用 $regexFind
,从 comment
字段中提取电子邮件(不区分大小写)。
db.feedback.aggregate( [ { $addFields: { "email": { $regexFind: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } } } }, { $set: { email: "$email.match"} } ] )
- 第一个阶段:
该阶段使用
$addFields
阶段向文档添加新字段email
。新字段包含在comment
字段上执行$regexFind
的结果:{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } } { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : null } { "_id" : 3, "comment" : "I can't find how to convert a date to string. cam@mongodb.com", "email" : { "match" : "cam@mongodb.com", "idx" : 46, "captures" : [ ] } } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } } - 第二阶段
此阶段使用
$set
阶段将email
重置为当前的"$email.match"
值。如果email
的当前值为 null,则会将email
的新值设为 null。{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : "aunt.arc.tica@example.com" } { "_id" : 2, "comment" : "I wanted to concatenate a string" } { "_id" : 3, "comment" : "I can't find how to convert a date to string. cam@mongodb.com", "email" : "cam@mongodb.com" } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : "fred@MongoDB.com" }
$regexFind
将 应用于数组的字符串元素
使用以下文档创建样本collectioncontacts
:
db.contacts.insertMany([ { "_id" : 1, name: "Aunt Arc Tikka", details: [ "+672-19-9999", "aunt.arc.tica@example.com" ] }, { "_id" : 2, name: "Belle Gium", details: [ "+32-2-111-11-11", "belle.gium@example.com" ] }, { "_id" : 3, name: "Cam Bo Dia", details: [ "+855-012-000-0000", "cam.bo.dia@example.com" ] }, { "_id" : 4, name: "Fred", details: [ "+1-111-222-3333" ] } ])
以下聚合使用$regexFind
将details
大量转换为具有email
和phone
字段的嵌入式文档:
db.contacts.aggregate( [ { $unwind: "$details" }, { $addFields: { "regexemail": { $regexFind: { input: "$details", regex: /^[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+$/, options: "i" } }, "regexphone": { $regexFind: { input: "$details", regex: /^[+]{0,1}[0-9]*\-?[0-9_\-]+$/ } } } }, { $project: { _id: 1, name: 1, details: { email: "$regexemail.match", phone: "$regexphone.match" } } }, { $group: { _id: "$_id", name: { $first: "$name" }, details: { $mergeObjects: "$details"} } }, { $sort: { _id: 1 } } ])
- 第一个阶段:
此阶段会将数组
$unwinds
为单个文档:{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "+672-19-9999" } { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "aunt.arc.tica@example.com" } { "_id" : 2, "name" : "Belle Gium", "details" : "+32-2-111-11-11" } { "_id" : 2, "name" : "Belle Gium", "details" : "belle.gium@example.com" } { "_id" : 3, "name" : "Cam Bo Dia", "details" : "+855-012-000-0000" } { "_id" : 3, "name" : "Cam Bo Dia", "details" : "cam.bo.dia@example.com" } { "_id" : 4, "name" : "Fred", "details" : "+1-111-222-3333" } - 第二阶段
该阶段使用
$addFields
阶段向包含电话号码和电子邮件的$regexFind
结果的文档添加新字段:{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "+672-19-9999", "regexemail" : null, "regexphone" : { "match" : "+672-19-9999", "idx" : 0, "captures" : [ ] } } { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "aunt.arc.tica@example.com", "regexemail" : { "match" : "aunt.arc.tica@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null } { "_id" : 2, "name" : "Belle Gium", "details" : "+32-2-111-11-11", "regexemail" : null, "regexphone" : { "match" : "+32-2-111-11-11", "idx" : 0, "captures" : [ ] } } { "_id" : 2, "name" : "Belle Gium", "details" : "belle.gium@example.com", "regexemail" : { "match" : "belle.gium@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null } { "_id" : 3, "name" : "Cam Bo Dia", "details" : "+855-012-000-0000", "regexemail" : null, "regexphone" : { "match" : "+855-012-000-0000", "idx" : 0, "captures" : [ ] } } { "_id" : 3, "name" : "Cam Bo Dia", "details" : "cam.bo.dia@example.com", "regexemail" : { "match" : "cam.bo.dia@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null } { "_id" : 4, "name" : "Fred", "details" : "+1-111-222-3333", "regexemail" : null, "regexphone" : { "match" : "+1-111-222-3333", "idx" : 0, "captures" : [ ] } } - 第三个阶段
该阶段使用
$project
阶段输出具有_id
字段、name
字段和details
字段的文档。details
字段设置为具有email
和phone
字段的文档,其值分别由regexemail
和regexphone
字段确定。{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999" } } { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "email" : "aunt.arc.tica@example.com" } } { "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11" } } { "_id" : 2, "name" : "Belle Gium", "details" : { "email" : "belle.gium@example.com" } } { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000" } } { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "email" : "cam.bo.dia@example.com" } } { "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } } - 第四阶段
该阶段使用
$group
阶段按输入文档的_id
值对输入文档进行分组。该阶段使用$mergeObjects
表达式来合并details
文档。{ "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000", "email" : "cam.bo.dia@example.com" } } { "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } } { "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999", "email" : "aunt.arc.tica@example.com" } } { "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11", "email" : "belle.gium@example.com" } } - 第五阶段
该阶段使用
$sort
阶段按_id
字段对文档排序。{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999", "email" : "aunt.arc.tica@example.com" } } { "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11", "email" : "belle.gium@example.com" } } { "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000", "email" : "cam.bo.dia@example.com" } } { "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }
使用捕获的分组来解析用户名
使用以下文档创建样本collectionemployees
:
db.employees.insertMany([ { "_id" : 1, name: "Aunt Arc Tikka", "email" : "aunt.tica@example.com" }, { "_id" : 2, name: "Belle Gium", "email" : "belle.gium@example.com" }, { "_id" : 3, name: "Cam Bo Dia", "email" : "cam.dia@example.com" }, { "_id" : 4, name: "Fred" } ])
员工电子邮件的格式为 <firstname>.<lastname>@example.com
。使用 $regexFind
结果中返回的 captured
字段,可以解析出员工的用户名。
db.employees.aggregate( [ { $addFields: { "username": { $regexFind: { input: "$email", regex: /^([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+$/, options: "i" } }, } }, { $set: { username: { $arrayElemAt: [ "$username.captures", 0 ] } } } ] )
- 第一个阶段:
该阶段使用
$addFields
阶段向文档添加新字段username
。新字段包含在email
字段上执行$regexFind
的结果:{ "_id" : 1, "name" : "Aunt Arc Tikka", "email" : "aunt.tica@example.com", "username" : { "match" : "aunt.tica@example.com", "idx" : 0, "captures" : [ "aunt.tica" ] } } { "_id" : 2, "name" : "Belle Gium", "email" : "belle.gium@example.com", "username" : { "match" : "belle.gium@example.com", "idx" : 0, "captures" : [ "belle.gium" ] } } { "_id" : 3, "name" : "Cam Bo Dia", "email" : "cam.dia@example.com", "username" : { "match" : "cam.dia@example.com", "idx" : 0, "captures" : [ "cam.dia" ] } } { "_id" : 4, "name" : "Fred", "username" : null } - 第二阶段
此阶段使用
$set
阶段将username
重置为"$username.captures"
数组的第零个元素。如果username
的当前值为 null,则会将username
的新值设为 null。{ "_id" : 1, "name" : "Aunt Arc Tikka", "email" : "aunt.tica@example.com", "username" : "aunt.tica" } { "_id" : 2, "name" : "Belle Gium", "email" : "belle.gium@example.com", "username" : "belle.gium" } { "_id" : 3, "name" : "Cam Bo Dia", "email" : "cam.dia@example.com", "username" : "cam.dia" } { "_id" : 4, "name" : "Fred", "username" : null }