What's the best practice to migrate standalone instance data to existed replicaset

Billy_Bui · October 8, 2024, 4:05pm

These advices are based on my experience, which might not be consistent across versions of MongoDB. So please keep that in mind and consult the documentation to have the correct information for your MongoDB version.

To get started, the most important thing is to always backup your data before doing anything. And I mean it.

Regarding your question about join the standalone node to an existing replica set, I recommend NOT do it, it will magically sync your data, but not the “magic” that you want. The point of replica set is that your data is consistent as a whole cluster, which every members have the same data, and same with the PRIMARY node. So if you join your standalone node to a replica set, and the cluster determines that it cannot use the oplog to apply operations to your standalone node to make it the same with PRIMARY, it will drop all your data, copy a snapshot of data from a member of the cluster, then start syncing from there. So do NOT do it. And that’s why backups are important.

My suggestion is to “split” your data into 2 parts:

The first is unchanged data, which is historical or rarely changed based on your business. You can move this data into your cluster first. It is a slow process and you need to take into consideration that your oplog will contain these operations so the cluster can sync them to other members. So take it slow and monitor the throughput to ensure it doesn’t affect the current replica set. You can also extend your oplog size if possible so that your oplog won’t stall in the bad case. (Stalling is like the case that you join the standalone node to a replica set, but in this case the oplog deletes oldest operations that haven’t sync to your SECONDARY yet, so it cannot use oplog to sync anymore, then the whole “magic” process begin)
The second is the rest of your data. You can take downtime to move them, or copy the data, take note of the timestamp, then sync the changes later. It will depend on your implementation. My suggestion is to convert your standalone mode to a new replica set, which will enable oplog. This way, you can use mongodump to dump data with oplog, and then restore your data to current replica set using the oplog option, which automatically includes all writes that happen during the dump process.

And yes, please do rename your database if possible to avoid name conflicts. It’s better to control conflicts before the sync than to resolve the mess after conflicts occur.

Sorry if this answer is too lengthy and hard to read. English is not my tongue language.