We keep receiving Query Targeting: Scanned Objects/Returned has gone above 1000 alerts, but we have struggled to find out which query is actually triggering this alert.
The alert is set with a threshold of 1000, and it is sent if the condition lasts at least 0 minutes, and it is resent after 60 minutes.
When looking at the profiler and selecting Examined:Returned Ratio, we have found a few queries that we know of that have a ratio greater than 1k. They are all indexed, but maybe we need to improve them since during the execution of many queries, they appear but no alert is triggered.
The main issue is that when we receive the alert - which is mainly during the night or weekend when there is low traffic for us - usually nothing appears in the profile, so we actually don’t know which query is causing the alert. This is likely because the query is not slow enough to appear.
As we don’t have any jobs or specific tasks running during evening hours or weekends, we assume that the alert threshold is based on an average for those specific time windows. Is that correct? Also, what is the best way to reduce false positives? Would changing the time windows be a solution?
This has been happening since mid-February without significant changes on our side (as far as we can see), so we are wondering if there have been any changes internally.
Welcome to the MongoDB community and thank you for your question! The Query Targeting alert today is based off the Query Targeting metric on your monitoring charts. If you go back in history to your monitoring charts when you saw a false positive Query Targeting alert trigger, are you able to see a spike in the Query Targeting metric?
It could definitely be the case that the Query Profiler missed an operation because the query had a high query targeting ratio but did not exceed a certain slowms execution time filter. This is a gap that we are currently working to address. In the near future, we will be updating Atlas Query Profiler to profile operations based on their slowms execution time as well as their query targeting ratio. This should help provide more visibility into those inefficient queries. However, in the meantime, would you mind checking your Monitoring charts to see if there is actually a spike in Query Targeting?
It seems that there was a spike in the monitoring chart last night, but nothing showed up in the profiler.
However, when there was a lot more traffic on our site last hour, the profiler showed a lot of queries with the same Examined:Returned Ratio issue (mostly the same query, but one that we know of and can’t do much about at the moment) but we didn’t receive any alert, so my guess is that it’s based on the average for a certain period.
I think this improvement would definitely help. Maybe another improvement could be the same alert, but for non-indexed queries only.
So, to come back to the alert, is it safe to ignore the alert if the operation is not actually slow ? It’s likely to be the same query as above (or maybe we can just increase the threshold a bit higher?).
The alert should have triggered, but if it did trigger from the past hour, it won’t trigger again until the next hour. Just curious, could it be possible that the alert didn’t trigger because it had already triggered the last hour?
We do also have a separate longer-term project to have the alert trigger based on Query Targeting per query shape. I think this will also help as we’ll be able to include the offending query shape in the alert details and possibly exclude certain query shapes from being alerted on.
A high query targeting value is typically indicative of a poorly optimized query and could mean that there might be another or other indexes that would better serve the query. If you navigate to the Performance Advisor, do you see any index recommendations related to that query shape? I might recommend seeing if there are other indexes that might be beneficial (while weighing the potential write performance costs) to see if we can make that query more efficient.
It doesn’t seems the behavior we see, for example during the day we can see a lot of queries with high Examined:Returned Ratio in the profiler but we don’t receive a single alert, but during the night or weekend there is also one or two alerts
There is a couple so we will try to see if we can improve on that but the query and index shown (and the one we see most frequently on profiler) has an avg of 99ms according (on a large users collection) so it’s not critical yet, we are just really curious about the alert and the query which trigger it - we we can’t see.