Regex is not working with collation

I have a pool collection and am trying to find a match for the Name field of the pool collection. I also want my query to work with case-insensitive and diacritic case-insensitive data.

I have a pool with Name=‘Maøé’ in my pool collection.
I have used the below queries:

The below query did not return any matching result when I am using regex.
db["Pool"].find({TenantId: BinData(3, "fjTbsQVA40CM5UsUlkTkDQ=="), OwnerId: BinData(3,"cDcPV5whymFeZKMg+8MN7w=="), Name: { $regex: "maoe", $options: "i" }}).collation({locale:"en", strength: 1})

The below query returned the expected pool.
db["Pool"].find({TenantId: BinData(3, "fjTbsQVA40CM5UsUlkTkDQ=="), OwnerId: BinData(3,"cDcPV5whymFeZKMg+8MN7w=="), Name: { $regex: "Maøé", $options: "i" }}).collation({locale:"en", strength: 1})

The below query returned the expected pool when and is working with the diacritic-insensitive search but here I have to use the exact match i.e. full-text search.

db["Pool"].find({TenantId: BinData(3, "fjTbsQVA40CM5UsUlkTkDQ=="), OwnerId: BinData(3,"cDcPV5whymFeZKMg+8MN7w=="), Name: "Maoe"}).collation({locale:"en", strength: 1})

The output of the query is given below

{
  _id: Binary.createFromBase64('FI2HAMcF/E6etfRWWcZ7Fg==', 3),
  AggregateVersion: 2,
  AuditableEvents: [],
  LastModified: 2024-12-19T13:05:05.266Z,
  DatabaseVersion: 638302904038095700,
  TenantId: Binary.createFromBase64('fjTbsQVA40CM5UsUlkTkDQ==', 3),
  CreatedAt: 2024-12-19T13:04:47.661Z,
  EntityReferences: [
    {
      Name: 'OwnerId',
      Key: '570f3770-219c-61ca-5e64-a320fbc30def'
    },
    {
      Name: 'SharedOwnerships.InternalEmployeeId',
      Key: '56bf3fcb-e7f4-8c92-e1e1-370c26f5bcc7'
    },
    {
      Name: 'SharedOwnerships.InternalEmployeeId',
      Key: '0d9c0a0b-f9ba-b92b-411f-a33312530aff'
    }
  ],
  Name: 'Maøé',
  OwnerId: Binary.createFromBase64('cDcPV5whymFeZKMg+8MN7w==', 3),
  SharedOwnerships: [
    {
      InternalEmployeeId: Binary.createFromBase64('yz+/VvTnkozh4TcMJvW8xw==', 3),
      DateAdded: 2024-12-19T13:05:05.249Z
    },
    {
      InternalEmployeeId: Binary.createFromBase64('CwqcDbr5K7lBH6MzElMK/w==', 3),
      DateAdded: 2024-12-19T13:05:05.254Z
    }
  ]
}

It appears that collation is not functioning as expected when using regex in my query. Specifically:

  • Diacritic-Insensitive and Case-Insensitive Search: These features do not seem to work properly when a regex-based query is used, even though collation is applied.

  • Why does collation fail to enable diacritic-insensitive and case-insensitive search in combination with regex?
    Could this be a limitation of MongoDB’s current implementation of regex and collation?

  • Is there an alternative approach to implement diacritic-insensitive and case-insensitive search effectively in MongoDB, especially for scenarios where regex is required?

I’m not sure about all your issues, but I do know that ø is not o + diacritic, but a separate letter of the Scandinavian alphabets. E.g., see Ø - Wikipedia