Howdy!
I’m copying the contents of an old mongodb 3.2.11 running on a Google Compute Engine (GCE) VM to a fresh installation of mongodb 4.4 on a new GCE VM.
Creating a new VM lets us revisit VM parameters, test the server before switching over, and leave behind unknown state on the old VM.
The Mongo docs don’t promise that an archive dumped from one mongo release can be restored into a newer mongo release. They do say to use the release of mongodump that goes with the source mongodb and the release of mongorestore that goes with the destination mongodb.
What I did so far:
- Created the new GCE VM with Debian 10 and installed mongo per
https://docs.mongodb.com/manual/tutorial/install-mongodb-on-debian/ - Ran mongodump on the source VM:
time mongodump --gzip --archive=mongo-$(date +"%Y-%m-%d").archive.gz
It didn’t like the--repair
and--oplog
options so I skipped those.
This created a 9.0GB file in close to 5 hours. - Copied the archive to the destination VM.
- Ran mongorestore on the destination VM:
time mongorestore --objcheck --drop --maintainInsertionOrder --gzip --archive=mongo-2021-04-13.archive.gz
This took only 39 seconds and didn’t restore all the contents, judging by the “show dbs” sizes:admin 0.000GB config 0.000GB fireworks 0.000GB jerry 0.001GB local 0.000GB # more... simulations 0.024GB
- No doubt the db sizes could vary with fragmentation and such, but the source VM shows
simulations
at17.497GB
. - mongorestore’s
--preserveUUID
option caused some warning messages, so I retried without that. - Leaving out
--maintainInsertionOrder
or--objcheck
or--nsInclude="simulations.*"
didn’t make a noticeable difference. Theshow dbs
sized varied from 24 - 29 MB after various runs, but maybe that’s just due to fragmentation.
Notes:
- We’re not using mongo users, authentication, or replication. No need to copy admin data.
- It’s fine to have downtime during this conversion.
- Our
pymongo
driver is up to date. - The new server works fine for FireWorks workflows. It just doesn’t have all the
simulations
data. - I’m a software developer with little MongoDB experience needing to do sysadmin duty on this.
- Searching the web, Stack Overflow, and this forum didn’t find an answer; only an encouraging You should not worry about it. You will not encounter any problems; also a bash script to run the data through a series of major releases of mongo, each in a Docker image. Using Docker is a great idea here but I’m trying to avoid trudging through all those intermediate releases.
Q1. How to make mongorestore restore all of the simulations
DB?
Q2. How to verify that it did, at least to the level of document counts and such?
Thanks so much!