Hi Wernfried,
With zoned sharding you would update the zone ranges on a schedule (eg daily) so older data would end up migrating from recent
to archive
shards. You would not have to coordinate this change across every member of your sharded cluster (as you will for filesystem symlinks).
This approach does presume that you would want to query your recent & archived data as a single sharded collection, rather than querying across multiple collections.
The extra info you provided in your latest response is that the archived data only needs to be retained for 6 months and indicates that you are concerned about the daily and total volume of data.
If you have already modelled your data so you can archive based on a collection naming convention, your first approach (symlinks) sounds more appropriate for your use case than dumping & restoring data (which includes rebuilding indexes).
However, choice of an approach is up to you. I’m just sharing suggestions based on the information you have provided.
I expect you are already aware, but there are some consequences of arbiters that will have a performance impact if you are routinely taking data-bearing members down for maintenance issues like updating symlinks.
For more background, please see my comment on Replica set with 3 DB Nodes and 1 Arbiter - #8 by Stennie_X.
Definitely not! My mention of initial sync was in the context of a one-off operation if you wanted to change your storage options to use directoryPerDB
and/or directoryForIndexes
. Grouping of related files by database or type can be helpful if you want to tune different mount point options.
If you are fine maintaining symlinks at a file level, you can skip any notions of changing storage options.
Regards,
Stennie