Aggregation pipeline performance

Nina · July 9, 2024, 3:38pm

Hi,

i have a question regarding query performance.

We have multiple complex aggregation pipelines that are very slow.

As I was doing analysis I saw in explain plan that we have FETCH phase.

I created a new index following ESR rule and focusing on main filter criteria.

This is the main criteria:

“Field_A”: “”,
“Field_B.B”: { “$in”: [“Y”, “N”] },
“Timestamp”: {“$gte”: , “$lte”: }

I created index: Index_name(Field_A:1, Field_B.B:1, Timestamp: -1)

That index improved performance significantly.

The reason we have a fetch phase is that after our main match criteria we have OR part ( this is only one part, there are a lot more OR and AND operators):

“$or” :
[{ “$expr” :
{ “$and” : [{ “$eq” : [“$Field_C.C_1.C_2”, “1”] }] } },
{ “$expr” : { “$and” : [{ “$eq” : [“”$Field_C.C_1.C_2", “5”] }

Now, for this i suggested optimization of the query and when I was running optimized query, fetch phase was improved.

This is part of the optimized query:

“$or”: [
{ “$Field_C.C_1.C_2”: { “$in”: [“1”, “5”, “6”, “10”] } },
{
“$and”: [
{ “Field_C.C_1.C_3”: “6” },
{ “Field_C.C_1.C_4”: { “$in”: [“04”, “07”, “08”, “09”] } }
]
},

This agg pipelines can generate between 300 000 and 2 millions of records, depending on timestamp range selected.

We have 23 GB RAM, index size is little above 13 GB. Wired tiger cache is 9 GB.

Also, when agg pipelines are started from BI tool, they run 3 variations of agg pipeline at the same time which obviously impacts performance.

I suggested that they optimize that as well, because we can get reuslt runnin the pipeline once.

Now I wanna know if this index and query optimization is step in the right direction or I have missed something?

Also, I was thinking of increasing wiredtiger cache to 11 GB (0.5* (23-1)= 11 ), would that be ok?

And just to be sure, is it possible that my index size and working set is exceeding RAM, i don’t think but want to be sure.

I believe that when they run agg pipeline only once and apply optimization we will get better performance.

Thank you in advance for advices and suggestions.