Greetings.
I have a standalone MongoDB instance and a replica set with three nodes. All of them are on the AWS cloud and self-managed.
I need to migrate data from the standalone MongoDB to the replica set. It has been tested with mongodump and mongorestore with around twenty hours of processing which is a bit too long for business downtime and I want to shrink it as much as I can.
I’ve come up with a solution and want to see if it works.
If I can join this standalone node into the existing replica set, wait for them to complete the sync, and then remove the node I joined before, will this work like a charm?
Also, they have a conflicting database name. If this solution works, I will remember to change the database name before proceeding.
These advices are based on my experience, which might not be consistent across versions of MongoDB. So please keep that in mind and consult the documentation to have the correct information for your MongoDB version.
To get started, the most important thing is to always backup your data before doing anything. And I mean it.
Regarding your question about join the standalone node to an existing replica set, I recommend NOT do it, it will magically sync your data, but not the “magic” that you want. The point of replica set is that your data is consistent as a whole cluster, which every members have the same data, and same with the PRIMARY node. So if you join your standalone node to a replica set, and the cluster determines that it cannot use the oplog to apply operations to your standalone node to make it the same with PRIMARY, it will drop all your data, copy a snapshot of data from a member of the cluster, then start syncing from there. So do NOT do it. And that’s why backups are important.
My suggestion is to “split” your data into 2 parts:
- The first is unchanged data, which is historical or rarely changed based on your business. You can move this data into your cluster first. It is a slow process and you need to take into consideration that your oplog will contain these operations so the cluster can sync them to other members. So take it slow and monitor the throughput to ensure it doesn’t affect the current replica set. You can also extend your oplog size if possible so that your oplog won’t stall in the bad case. (Stalling is like the case that you join the standalone node to a replica set, but in this case the oplog deletes oldest operations that haven’t sync to your SECONDARY yet, so it cannot use oplog to sync anymore, then the whole “magic” process begin)
- The second is the rest of your data. You can take downtime to move them, or copy the data, take note of the timestamp, then sync the changes later. It will depend on your implementation. My suggestion is to convert your standalone mode to a new replica set, which will enable oplog. This way, you can use mongodump to dump data with oplog, and then restore your data to current replica set using the oplog option, which automatically includes all writes that happen during the dump process.
And yes, please do rename your database if possible to avoid name conflicts. It’s better to control conflicts before the sync than to resolve the mess after conflicts occur.
Sorry if this answer is too lengthy and hard to read. English is not my tongue language.
Hi,
Your proposed solution of temporarily adding the standalone MongoDB instance as a member of the replica set for data migration is theoretically sound but comes with some risks and limitations. Let’s explore this approach and some best practices for migrating MongoDB data with minimal downtime.
Key Considerations for Your Solution:
- Database Name Conflict:
- The first major hurdle is the conflicting database name. You cannot have two databases with the same name in the replica set unless you rename one of them. You’ll need to address this before attempting the migration.
- Standalone Node Integration:
- MongoDB replica sets work by having a primary node and secondary nodes. The secondary nodes replicate the data from the primary. If you try to add a standalone node to an existing replica set, MongoDB will treat it as a secondary and start syncing data from the primary.
- However, this will not bring the data from the standalone instance into the replica set, because the data from the primary will overwrite the standalone data on the newly added secondary.Therefore, this approach won’t help you achieve the desired result unless you somehow merge the data sets, which is complex and not recommended.
Alternative Strategies:
1. Change Standalone Instance into a Primary Node of a New Replica Set (Safe Approach):
- Convert the standalone MongoDB instance into a replica set by initializing it as a single-member replica set.
- Then, add the current replica set nodes as secondaries to this new replica set (which has your standalone as the primary).
- This will replicate the data from the standalone instance to the existing nodes in the replica set. Once the replication is complete, you can perform a stepdown to make one of the other nodes the primary and then remove the standalone node from the replica set.
Steps:
- Initiate a replica set on the standalone node:
rs.initiate()
- Add the existing replica set members to this replica set:
rs.add("<replica_set_member_1>")
rs.add("<replica_set_member_2>")
rs.add("<replica_set_member_3>")
- Wait for replication to complete.
- Step down the standalone as primary:
rs.stepDown()
- Remove the standalone node from the replica set:
rs.remove("<standalone_node>")
2. Live Migration with mongomirror
(Minimal Downtime):
MongoDB’s mongomirror
is designed to perform live data migration with minimal downtime. It can be used to sync data between a standalone MongoDB and a replica set in real time, which reduces the downtime required.
Steps:
- Install
mongomirror
. - Use it to start mirroring data from the standalone instance to the replica set.
- Once the sync is complete, switch your application to point to the replica set.
This approach allows for continuous replication and minimizes downtime since only the final cutover step requires you to stop the application.
Limitations:
mongomirror
is only available for MongoDB Atlas, but since you’re self-managing on AWS, you’d need to explore this option carefully or check if the tool can be adapted for your needs.
3. Sharding (If Feasible):
If your workload and data model allow it, you can migrate to a sharded cluster instead. MongoDB sharding allows you to distribute your data across multiple servers, and data can be moved between shards with less downtime. However, this is a more complex solution and requires planning if you aren’t already using sharding.
Summary:
- Your solution of adding the standalone instance to the replica set will not work as expected because the existing primary will overwrite the standalone data during replication.
- Instead, consider the following:
- Convert the standalone into a new primary for the replica set and then remove it after replication is done.
- Use
mongomirror
for live migration to reduce downtime. - Consider sharding if the workload fits.
Each approach has trade-offs in terms of downtime, complexity, and risk, so choose based on your infrastructure’s flexibility and your business requirements.
@Mayank_Anand2, same as in my other reply, could you please update your post with the mention that you use some kind of Generative AI to generate your response? Just like you did in Performance Impact of 955 indexes in one database - #2 by Mayank_Anand2.
I think it is important for people to know. To me it is obvious it comes from AI since your 4 replies appears at the same time and uses the same nice formatting.
Hi Billy:
Thank you for your reply and for sharing your experience. I very much appreciate it.
After discussing it with our developer team, we decided to split our data into two parts, as you mentioned before. I’ll come back with the actual downtime we shrieked about.
As I mentioned before, here are our migration results.
First, I want to thank Billy for providing his experience handling MongoDB migration. With this implementation, our data migration downtime from around 20 hours with full database dump/restore to around three hours. We migrated around 100M documents, a vast improvement compared to the old method we tried before.
I’m going to get some sleep, cheers.
Off-topic: We expect to finish maintenance three hours earlier since this vast improvement. However, I dug into our legacy system since it won’t serve with unclear messages after we migrated our data to the replica set. (I’m pretty frustrated about this)
Great news, glad that I could help.
About off-topic: Well, sometimes unexpected bugs happened “after” migration. But it’s part of the job and I kind of enjoy them (of course only after fixed them).
This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.