Hi all -
Bit of a tricky one here - appreciate any advice you can give.
We have recently upgraded from 3.6 to 4.4.24 for our application, along with a Mongoose upgrade. We use Docker Swarm to deploy it across 3 nodes in a replicaset with a separate service per MongoDB role.
Since upgrading from 3.6, we are seeing regular MongoDB container restarts on the primary node, which is causing our application to crash. The container is restarting on other nodes, but less regularly.
There is no information in the Docker logs about why the container is crashing; suggesting an OOM issue.
We limit the Mongo container to 2GB RAM using Docker limits (we’re not doing much I/O and the DB is small) - watching docker stats output shows memory creeping up to the 2GB limit, then the process is killed by the OS and connections to the replicaset are reset.
Is there anything we can do to enforce the RAM limit within the Mongo application so it doesn’t hit 2GB? I’ve tried limiting the WiredTiger cache size, but it’s not very big - see below - and doesn’t help.
Noticed we have a high active connection count today - does each active connection take up RAM / could this be contributing?
I ran a script on one of our production systems and filtered active connections by IP; the169.254.4.x addresses are our swarm endpoints, 1 per host.
{
"TOTAL_CONNECTION_COUNT" : 5264,
"169.254.4.8" : 1895,
"169.254.4.9" : 1939,
"169.254.4.5" : 1397,
"Internal" : 32,
"127.0.0.1" : 1
}
We only have 1 active application server (running on the 169.254.4.8 host) - any idea why we are seeing a high number of connections across all three? I believe the only connections from the other two are for replication - could there be something wrong with the RS configuration which makes it duplicate connections?
Replicaset config is here - we’re using DNS for each node as we’re running in Swarm mode:
{
"_id" : "xxxxxxxxxxxxxx",
"version" : 2,
"term" : 25,
"protocolVersion" : NumberLong(1),
"writeConcernMajorityJournalDefault" : true,
"members" : [
{
"_id" : 0,
"host" : "mongo_primary:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "mongo_secondary:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 1,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "mongo_manager:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 0,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
"settings" : {
"chainingAllowed" : true,
"heartbeatIntervalMillis" : 10000,
"heartbeatTimeoutSecs" : 20,
"electionTimeoutMillis" : 10000,
"catchUpTimeoutMillis" : -1,
"catchUpTakeoverDelayMillis" : 30000,
"getLastErrorModes" : {
},
"getLastErrorDefaults" : {
"w" : 1,
"wtimeout" : 0
},
"replicaSetId" : ObjectId("6544205e6a308455ab730738")
}
}
Some more info on the Mongo DB if useful:
Mongo cache
"bytes allocated for updates" : 25474313,
"bytes belonging to page images in the cache" : 1444422,
"bytes belonging to the history store table in the cache" : 547,
"bytes currently in the cache" : 27046123,
"bytes dirty in the cache cumulative" : 2172249945,
"bytes not belonging to page images in the cache" : 25601701,
"bytes read into cache" : 2713104,
"bytes written from cache" : 1221520852,
Mongo memory allocation:
db.serverStatus().tcmalloc.tcmalloc.formattedString
------------------------------------------------
MALLOC: 1939406736 ( 1849.6 MiB) Bytes in use by application
MALLOC: + 18894848 ( 18.0 MiB) Bytes in page heap freelist
MALLOC: + 25106008 ( 23.9 MiB) Bytes in central cache freelist
MALLOC: + 1009344 ( 1.0 MiB) Bytes in transfer cache freelist
MALLOC: + 591508312 ( 564.1 MiB) Bytes in thread cache freelists
MALLOC: + 84410368 ( 80.5 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 2660335616 ( 2537.1 MiB) Actual memory used (physical + swap)
MALLOC: + 5971968 ( 5.7 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 2666307584 ( 2542.8 MiB) Virtual address space used
MALLOC:
MALLOC: 132361 Spans in use
MALLOC: 22206 Thread heaps in use
MALLOC: 4096 Tcmalloc page size