Atlas Search "text": low performance when running a large number of queries at once

Hi Community!

I’m building a CAT tool (Computer-aided Translation) that involves doing fuzzy matches between an input file and a translation memory (database of previously translated strings). The input file is a list of String documents, and the translation memory contains a list of TranslationUnit documents. My tier is M10 (the basic one). My approach is as follows:

  • Run aggregate on String, use $lookup to cross check key and source.text of String and TranslationUnit, then find those with the same key and source.text for 101% matches, and those with different key but same source.text as 100% matches. In the “as” attribute of $lookup, these matches are added to 101match and 100match fields respectively.

  • Check the array length of 101match and 100match of returned documents, if the length >0, add a new field matchScore and assign 101 or 100. If length = 0, assign -1 to matchScore.

  • Since $search doesn’t support the $let keywords, I cannot perform fuzzy searches with in the same aggregate command. Instead, I filter out all String documents with “matchScore === -1” and pass each of them to a new aggregate command where the pipeline contains $search. This is where I have to send a really big amount of ocncurrent queries, and the response always times out with 60K String documents (all fuzzy) and 220K TranslationUnit documents.
    `async function findFuzzy(string) {
    let pipeline = [
    {
    $search: {
    // find fuzzy strings in source text
    index: “tmSourceTextOnly”,
    text: {
    query: string.source.text,
    path: “source.text”,
    // fuzzy: { maxEdits: 1 }
    }

                              }
                          },
                          // check if these fuzzy source has the current target translation
                               {
                                   $match: {
                                       translations: {
                                           $elemMatch: {
                                               lang: taskTargetLang  
                                           }
                                       },
                                       parentTranslationMemory: {
                                           $in: projectTranslationMemories
                                       }
                                   }
                               },
                         
                          
                          {
                              // 1 is enough for creating analyses
                              $limit: 1
                          }
                      
      ];
      const result = await TranslationUnit.aggregate(pipeline);
      string['fuzzyMatch'] = result;
      console.log('fuzzy single string result', result);
      return string
    

    }

    let fuzzyPromises =

    for (let string of allMatches.fuzzyMatch) {
    fuzzyPromises.push(findFuzzy(string))
    }
    console.log(‘finding fuzzy matches…’, fuzzyPromises.length);
    allMatches.fuzzyMatch = await Promise.all(fuzzyPromises);`
    In comparison, if the file is mostly made of 101% and 100% matches, the whole process could take 30s or even 20s.
    Indexes and Search Indexes are in place for all related fields, so I assume I’m not missing anything important.

I think my key question here is: is this the right way to use $search? If not, is there a plausible solution for my use case I can look into? I really hope I can achieve this with Atlas Search because of how convenient it is, and if I have to implement fuzzy matching outside of Atlas Search, then the M10 subscription may no longer be worth it.

Thank you!