Hello,
we’re managing a small MongoDB 4.2 ReplicaSet cluster (PSA) with about 54GBs of on-disk WireTiger-based data.
Today, I was upgrading the replica from <4.0 and MMAPv1 (Meaning I had to delete its whole datadir and let it start up from the primary from scratch).
All worked fine after I readded it into the replicaset and it transition into the Startup2 state, meaning it was fetching data from primary.
What I didn’t expect is for the primary to crash due to exceeding the default max number of open files (64k)
The reason it reached such a high number is probably that our client has a large number of small collections inside the database, resulting in about 74k inodes taken up by the datadir.
It seems the server loops over all the files, incrementally sending them over, without ever closing them again.
Even now, several hours after the incident, the process’s holding ~73.5k file descriptors open (We had to increase the limit to allow the replica to start up).
Is this the intended behavior? Only “solution” to this problem I was able to find online is “Increase the max FDs limit”, which is… Not really a solution, rather, a hotpatch…