Mongo Atlas Online Archive to cloud storage

Deepali_Bharankar · April 24, 2024, 7:49am

We have large growing collection in mongo each day. It has reached 200Million documents. To reduce cost we would like to use Online Archive option to archive old year data to AWS S3. Few questions related to this option:

Collection has _id (ObjectId) column but no date column. ObjectId has date part in it. Can this be used for archiving rule? and How?
Documents in this Collection are always deleted and recreated. But not modified. Can data in archive be deleted based on id in case user open any old record and re-generate data?
How much time will it take to load single document from archive based on id, in seconds or milliseconds?
Are there any management changes to use Online archive apart from Atlas Data Federation costs?

Hartek_Sabharwal · April 25, 2024, 2:25pm

Hello Deepali!

Yes, you can specify a custom archiving rule that extracts the date part of the ObjectID using MongoDB query language. See the section that says “when the current date exceeds the date inside an objectId” from https://www.mongodb.com/docs/atlas/online-archive/configure-online-archive/
No, you cannot delete or modify data in the archive currently. If we archive a document with _id “123” and that document gets re-created on the collection, we will archive it a second time. There will be two documents in the archive with the _id “123”. There is no _id uniqueness constraint in Online Archive.
Assuming _id is the first partition field you choose for the archive, I’d expect it to take not more than a few seconds.
I’m not sure what you mean by management changes, but there is a small storage cost for Online Archive which you can see on the pricing page: Pricing | MongoDB

Prem_PK_Krishna · April 29, 2024, 1:39am

To clarify a little further on point #1:

You cannot use _id as the date field in the Date criteria. Technically, you can incorporate the workaround of using the $expr in custom criteria of Online Archive that @Hartek_Sabharwal mentioned above.

However, the custom query will likely not be using an index and the archival process itself will be likely slow. We have mentioned below in our documentation :

For custom criteria that use an expression, Atlas might first convert a value before it evaluates it against the query.

The recommendation is that you create a new indexed date field and use the Date criteria to archive. This is the right approach that will optimize/improve the archiving speed.

We wouldn’t recommend custom criteria and using $expr and using _id due to the slowness mentioned above.

system · May 8, 2024, 5:25am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.