Hi Mongo Community,
We use mongodb utility tools in our everyday workflow to copy production data in a local development environment (e.g. to troubleshoot existing functionality or develop new features). To streamline or workflow, we have written bash scripts, which we run from our development machines, and that use ssh tunneling to access prod.
Two use cases we’re having issues with:
-
[A] COPY ENTIRE PROD DB TO DEV machine’s local environment:
-
calls mongodump over an ssh tunnel (dumps prod data in a local folder),
-
drops the local database
-
runs mongorestore to load prod data in the local database
-
runs mongosh commands to delete sensitive data that isn’t necessary for development
-
ISSUE #1:
- Running mongodump over ssh is extremely SLOW (mainly because the data is uncompressed?)
- Even before ISSUE #2 started occurring, we would sometimes copy the database manually (bypass our own script) because it was faster (i.e. logon to prod server via ssh, run mongodump manually, zip the data, scp the data, unzip locally, mongorestore locally).
-
ISSUE #2:
- Mongodump (sometimes?) fails in the middle of dumping our largest collection:
-
Failed: error writing data for collection
mydbname.mycollectionname to disk: error reading collection: connection pool for 127.0.0.1:6666 was cleared because another operation failed with: connection() error occurred during connection handshake: connection(127.0.0.1:6666[-26]) incomplete read of message header: read tcp 127.0.0.1:57313->127.0.0.1:6666: i/o timeout
-
[B] COPY ONE USER FROM PROD TO DEV:
- Works roughly the same as the above except that it uses mongoexport & mongoimport, and it runs a few additional mongosh commands to make sure we do not have any id clashes.
-
ISSUE #3
- The script is SLOW (takes ~10-20 seconds), and we need to run it 15-20 times on a normal day.
- It is slow even when copying a user that has very little data.
- In this case, the slowness comes from the opening and closing of multiple ssh tunnels sequentially (need to run multiple mongoexport commands on the different collections where user data is stored).
Would love to hear how other dev teams are tackling the problem of copying production data to dev. I would love to brainstorm different solutions with other teams facing similar challenges.
Thanks!
Xavier