Hi there,
I have a scenario where the collection has the online archive enabled and is partitioned by the “timestamp” (ISODate) and “contactId” (UUID) field. The limit for data to go to archive is 90 days.
If I run the query below, it returns almost instantly if I connect directly to the cluster, but connecting to the archive it takes about 5 seconds.
db.messages.find({timestamp:{$gte: ISODate(‘2023-02-19T00:00:00Z’), $lt: ISODate(‘2023-02-21T00:00:00Z’)}, contactId: UUID(‘3ac88cfc-3ac9-46da-9106-087c53058de5’) })
For the archive this time of 5 seconds is expected and ok, but with unarchived data, as the data is partitioned by timestamp, shouldn’t it understand that it is not archived and query only the online data?
Regards!
1 Like
Hi Bruno,
Regarding your question of the time taken to query against the archive vs the cluster, it is expected to take that long when querying the archive. With the existing version of Online Archive, you can expect some latency to list down the partitions in the archive in object storage and the querying time will increase with more TBs of data in the archive.
With the new version of Online Archive, we are improving the querying performance and we are going to optimize storage and incorporate rebalancing techniques. This will improve query performance and decrease costs when querying against the archives. This will also understand which partition to query against as data is sorted/rebalanced.
The new version will be much faster when running similar queries such as yours to find data in the archive, but a general thumb rule is that querying against the archive (in object storage) will be a notch slower than querying against the cluster. However, with the new version of Online archive, it will show good performance improvements compared to the previous version of the archive. If you are interested in testing out the new feature in your non-production environment, I can sign you up for the Private Preview program.
Details of the announcement of the new feature and the “Private Preview program” are mentioned here: https://www.mongodb.com/community/forums/t/invitation-to-participate-in-the-early-access-program-of-online-archives-query-performance-improvements/204188
Thanks,
Prem