$regexFindAll (aggregation)
On this page
Definition
$regexFindAll
New in version 4.2.
Provides regular expression (regex) pattern matching capability in aggregation expressions. The operator returns an array of documents that contains information on each match. If a match is not found, returns an empty array.
MongoDB uses Perl compatible regular expressions (i.e. "PCRE" ) version 8.41 with UTF-8 support.
Prior to MongoDB 4.2, aggregation pipeline can only use the query operator
$regex
in the$match
stage. For more information on using regex in a query, see$regex
.
Syntax
The $regexFindAll
operator has the following syntax:
{ $regexFindAll: { input: <expression> , regex: <expression>, options: <expression> } }
Field | Description | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
The string on which you wish to apply the regex pattern. Can be a string or any valid expression that resolves to a string. | |||||||||||
The regex pattern to apply. Can be any valid expression that resolves to either a string or regex
pattern
Alternatively, you can also specify the regex options with the
options field. To specify the You cannot specify options in both the | |||||||||||
Optional. The following NoteYou cannot specify options in both the
|
Returns
The operator returns an array:
If the operator does not find a match, the operator returns an empty array.
If the operator finds a match, the operator returns an array of documents that contains the following information for each match:
the matching string in the input,
the code point index (not byte index) of the matching string in the input, and
An array of the strings that corresponds to the groups captured by the matching string. Capturing groups are specified with unescaped parenthesis
()
in the regex pattern.
[ { "match" : <string>, "idx" : <num>, "captures" : <array of strings> }, ... ]
Behavior
$regexFindAll
and Collation
$regexFindAll
ignores the collation specified for the
collection, db.collection.aggregate()
, and the index, if used.
For example, the create a sample collection with collation strength
1
(i.e. compare base character only and ignore other differences
such as case and diacritics):
db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )
Insert the following documents:
db.myColl.insertMany([ { _id: 1, category: "café" }, { _id: 2, category: "cafe" }, { _id: 3, category: "cafE" } ])
Using the collection's collation, the following operation performs a case-insensitive and diacritic-insensitive match:
db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )
The operation returns the following 3 documents:
{ "_id" : 1, "category" : "café" } { "_id" : 2, "category" : "cafe" } { "_id" : 3, "category" : "cafE" }
However, the aggregation expression $regexFind
ignores
collation; that is, the following regular expression pattern matching examples
are case-sensitive and diacritic sensitive:
db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ] ) db.myColl.aggregate( [ { $addFields: { results: { $regexFindAll: { input: "$category", regex: /cafe/ } } } } ], { collation: { locale: "fr", strength: 1 } } // Ignored in the $regexFindAll )
Both operations return the following:
{ "_id" : 1, "category" : "café", "results" : [ ] } { "_id" : 2, "category" : "cafe", "results" : [ { "match" : "cafe", "idx" : 0, "captures" : [ ] } ] } { "_id" : 3, "category" : "cafE", "results" : [ ] }
To perform a case-insensitive regex pattern matching, use the
i
Option instead. See
i
Option for an example.
captures
Output Behavior
If your regex pattern contains capture
groups and the pattern finds a match in the
input, the captures
array in the results
corresponds to the groups captured by the matching string. Capture
groups are specified with unescaped parentheses ()
in the
regex pattern. The length of the
captures
array equals the number of capture groups in the pattern
and the order of the array matches the order in which the capture groups
appear.
Create a sample collection named contacts
with the following
documents:
db.contacts.insertMany([ { "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" }, { "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" }, { "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" }, { "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" }, { "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" } ])
The following pipeline applies the regex
pattern /(C(ar)*)ol/
to the fname
field:
db.contacts.aggregate([ { $project: { returnObject: { $regexFindAll: { input: "$fname", regex: /(C(ar)*)ol/ } } } } ])
The regex pattern finds a match with fname
values Carol
and Colleen
:
{ "_id" : 1, "returnObject" : [ { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } ] } { "_id" : 2, "returnObject" : [ ] } { "_id" : 3, "returnObject" : [ ] } { "_id" : 4, "returnObject" : [ { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } ] } { "_id" : 5, "returnObject" : [ ] }
The pattern contains the capture group (C(ar)*)
which contains the
nested group (ar)
. The elements in the captures
array correspond
to the two capture groups. If a matching document is not captured by a
group (e.g. Colleen
and the group (ar)
),
$regexFindAll
replaces the group with a null placeholder.
As shown in the previous example, the captures
array contains an
element for each capture group (using null
for non-captures).
Consider the following example which searches for phone numbers with
New York City area codes by applying a logical or
of capture
groups to the phone
field. Each group represents a New York City
area code:
db.contacts.aggregate([ { $project: { nycContacts: { $regexFindAll: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ } } } } ])
For documents which are matched by the regex
pattern, the captures
array includes the matching capture group
and replaces any non-capturing groups with null
:
{ "_id" : 1, "nycContacts" : [ { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } ] } { "_id" : 2, "nycContacts" : [ { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } ] } { "_id" : 3, "nycContacts" : [ ] } { "_id" : 4, "nycContacts" : [ ] } { "_id" : 5, "nycContacts" : [ { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } ] }
Examples
$regexFindAll
and Its Options
To illustrate the behavior of the $regexFindAll
operator as
discussed in this example, create a sample collection products
with
the following documents:
db.products.insertMany([ { _id: 1, description: "Single LINE description." }, { _id: 2, description: "First lines\nsecond line" }, { _id: 3, description: "Many spaces before line" }, { _id: 4, description: "Multiple\nline descriptions" }, { _id: 5, description: "anchors, links and hyperlinks" }, { _id: 6, description: "métier work vocation" } ])
By default, $regexFindAll
performs a case-sensitive match.
For example, the following aggregation performs a case-sensitive
$regexFindAll
on the description
field. The regex
pattern /line/
does not specify any grouping:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/ } } } } ])
The operation returns the following:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ]}, { "match" : "line", "idx" : 19, "captures" : [ ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
The following regex pattern /lin(e|k)/
specifies a grouping
(e|k)
in the pattern:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k)/ } } } } ])
The operation returns the following:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject": [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
In the return option, the idx
field is the code point index and not the byte
index. To illustrate, consider the following example that uses the
regex pattern /tier/
:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /tier/ } } } } ])
The operation returns the following where only the last record
matches the pattern and the returned idx
is 2
(instead of 3
if using a byte index)
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ { "match" : "tier", "idx" : 2, "captures" : [ ] } ] }
i
Option
Note
You cannot specify options in both the regex
and the
options
field.
To perform case-insensitive pattern matching, include the i option as part of the regex field or in the options field:
// Specify i as part of the regex field { $regexFindAll: { input: "$description", regex: /line/i } } // Specify i in the options field { $regexFindAll: { input: "$description", regex: /line/, options: "i" } } { $regexFindAll: { input: "$description", regex: "line", options: "i" } }
For example, the following aggregation performs a case-insensitive
$regexFindAll
on the description
field. The regex
pattern /line/
does not specify any grouping:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /line/i } } } } ])
The operation returns the following documents:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ { "match" : "LINE", "idx" : 7, "captures" : [ ] } ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ ] }, { "match" : "line", "idx" : 19, "captures" : [ ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
m
Option
Note
You cannot specify options in both the regex
and the
options
field.
To match the specified anchors (e.g. ^
, $
) for each line of a
multiline string, include the m option
as part of the regex field or in the
options field:
// Specify m as part of the regex field { $regexFindAll: { input: "$description", regex: /line/m } } // Specify m in the options field { $regexFindAll: { input: "$description", regex: /line/, options: "m" } } { $regexFindAll: { input: "$description", regex: "line", options: "m" } }
The following example includes both the i
and the m
options to
match lines starting with either the letter s
or S
for
multiline strings:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /^s/im } } } } ])
The operation returns the following:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ { "match" : "S", "idx" : 0, "captures" : [ ] } ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "s", "idx" : 12, "captures" : [ ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
x
Option
Note
You cannot specify options in both the regex
and the
options
field.
To ignore all unescaped white space characters and comments (denoted by
the un-escaped hash #
character and the next new-line character) in
the pattern, include the s option in the
options field:
// Specify x in the options field { $regexFindAll: { input: "$description", regex: /line/, options: "x" } } { $regexFindAll: { input: "$description", regex: "line", options: "x" } }
The following example includes the x
option to skip unescaped white
spaces and comments:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } } ])
The operation returns the following:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ { "match" : "line", "idx" : 6, "captures" : [ "e" ] }, { "match" : "line", "idx" : 19, "captures" : [ "e" ] } ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "line", "idx" : 23, "captures" : [ "e" ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "line", "idx" : 9, "captures" : [ "e" ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ { "match" : "link", "idx" : 9, "captures" : [ "k" ] }, { "match" : "link", "idx" : 24, "captures" : [ "k" ] } ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
s
Option
Note
You cannot specify options in both the regex
and the
options
field.
To allow the dot character (i.e. .
) in the pattern to match all
characters including the new line character, include the s option in the options field:
// Specify s in the options field { $regexFindAll: { input: "$description", regex: /m.*line/, options: "s" } } { $regexFindAll: { input: "$description", regex: "m.*line", options: "s" } }
The following example includes the s
option to allow the dot
character (i.e. .) to match all characters including new line as well
as the i
option to perform a case-insensitive match:
db.products.aggregate([ { $addFields: { returnObject: { $regexFindAll: { input: "$description", regex:/m.*line/, options: "si" } } } } ])
The operation returns the following:
{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : [ ] } { "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : [ ] } { "_id" : 3, "description" : "Many spaces before line", "returnObject" : [ { "match" : "Many spaces before line", "idx" : 0, "captures" : [ ] } ] } { "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : [ { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } ] } { "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : [ ] } { "_id" : 6, "description" : "métier work vocation", "returnObject" : [ ] }
Use $regexFindAll
to Parse Email from String
Create a sample collection feedback
with the following documents:
db.feedback.insertMany([ { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" }, { "_id" : 2, comment: "I wanted to concatenate a string" }, { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" }, { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" } ])
The following aggregation uses the $regexFindAll
to extract
all emails from the comment
field (case insensitive).
db.feedback.aggregate( [ { $addFields: { "email": { $regexFindAll: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } } } }, { $set: { email: "$email.match"} } ] )
- First Stage
The stage uses the
$addFields
stage to add a new fieldemail
to the document. The new field is an array that contains the result of performing the$regexFindAll
on thecomment
field:{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ ] } ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } ] } - Second Stage
The stage use the
$set
stage to reset theemail
array elements to the"email.match"
value(s). If the current value ofemail
is null, the new value ofemail
is set to null.{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : [ "aunt.arc.tica@example.com" ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "email" : [ "cam@mongodb.com", "c.dia@mongodb.com" ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "email" : [ "fred@MongoDB.com" ] }
Use Captured Groupings to Parse User Name
Create a sample collection feedback
with the following documents:
db.feedback.insertMany([ { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com" }, { "_id" : 2, comment: "I wanted to concatenate a string" }, { "_id" : 3, comment: "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com" }, { "_id" : 4, comment: "It's just me. I'm testing. fred@MongoDB.com" } ])
To reply to the feedback, assume you want to parse the local-part of
the email address to use as the name in the greetings. Using the
captured
field returned in the $regexFindAll
results,
you can parse out the local part of each email address:
db.feedback.aggregate( [ { $addFields: { "names": { $regexFindAll: { input: "$comment", regex: /([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }, } }, { $set: { names: { $reduce: { input: "$names.captures", initialValue: [ ], in: { $concatArrays: [ "$$value", "$$this" ] } } } } } ] )
- First Stage
The stage uses the
$addFields
stage to add a new fieldnames
to the document. The new field contains the result of performing the$regexFindAll
on thecomment
field:{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "names" : [ { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ "aunt.arc.tica" ] } ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "names" : [ { "match" : "cam@mongodb.com", "idx" : 56, "captures" : [ "cam" ] }, { "match" : "c.dia@mongodb.com", "idx" : 75, "captures" : [ "c.dia" ] } ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "names" : [ { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ "fred" ] } ] } - Second Stage
The stage use the
$set
stage with the$reduce
operator to resetnames
to an array that contains the"$names.captures"
elements.{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "names" : [ "aunt.arc.tica" ] } { "_id" : 2, "comment" : "I wanted to concatenate a string", "names" : [ ] } { "_id" : 3, "comment" : "How do I convert a date to string? Contact me at either cam@mongodb.com or c.dia@mongodb.com", "names" : [ "cam", "c.dia" ] } { "_id" : 4, "comment" : "It's just me. I'm testing. fred@MongoDB.com", "names" : [ "fred" ] }
Tip
See also:
For more information on the behavior of the captures
array and
additional examples, see
captures
Output Behavior.