Hi,
i have a question regarding query performance.
We have multiple complex aggregation pipelines that are very slow.
As I was doing analysis I saw in explain plan that we have FETCH phase.
I created a new index following ESR rule and focusing on main filter criteria.
This is the main criteria:
“Field_A”: “”,
“Field_B.B”: { “$in”: [“Y”, “N”] },
“Timestamp”: {“$gte”: , “$lte”: }
I created index: Index_name(Field_A:1, Field_B.B:1, Timestamp: -1)
That index improved performance significantly.
The reason we have a fetch phase is that after our main match criteria we have OR part ( this is only one part, there are a lot more OR and AND operators):
“$or” :
[{ “$expr” :
{ “$and” : [{ “$eq” : [“$Field_C.C_1.C_2”, “1”] }] } },
{ “$expr” : { “$and” : [{ “$eq” : [“”$Field_C.C_1.C_2", “5”] }
Now, for this i suggested optimization of the query and when I was running optimized query, fetch phase was improved.
This is part of the optimized query:
“$or”: [
{ “$Field_C.C_1.C_2”: { “$in”: [“1”, “5”, “6”, “10”] } },
{
“$and”: [
{ “Field_C.C_1.C_3”: “6” },
{ “Field_C.C_1.C_4”: { “$in”: [“04”, “07”, “08”, “09”] } }
]
},
This agg pipelines can generate between 300 000 and 2 millions of records, depending on timestamp range selected.
We have 23 GB RAM, index size is little above 13 GB. Wired tiger cache is 9 GB.
Also, when agg pipelines are started from BI tool, they run 3 variations of agg pipeline at the same time which obviously impacts performance.
I suggested that they optimize that as well, because we can get reuslt runnin the pipeline once.
Now I wanna know if this index and query optimization is step in the right direction or I have missed something?
Also, I was thinking of increasing wiredtiger cache to 11 GB (0.5* (23-1)= 11 ), would that be ok?
And just to be sure, is it possible that my index size and working set is exceeding RAM, i don’t think but want to be sure.
I believe that when they run agg pipeline only once and apply optimization we will get better performance.
Thank you in advance for advices and suggestions.