Read/Write request issue related to IOPS reached around 30K/sec and exceeding the configured maximum Disk IOPS of approximately 16K

Moni_Hazarika · May 14, 2024, 9:09am

We are seeing replication lag going upto approximately ~4 minutes on both the secondaries. We have a replica set of 3 with same node configuration for primary and secondary. We had created a ticket with mongo support but not getting any leads.
Due to the high workload of inserts, disk utilization spiked, and concurrently, Read/Write request issue related to IOPS reached around 30K/sec and exceeding the configured maximum Disk IOPS of approximately 16K.
For slow queries we have added indexes suggested by mongo.
We see some queries performs an IXSCAN operation but still exhibits a high KeysExamined:nReturned` ratio and read GBs of data from disk. Not sure what’s the reason.
Similarly write IOPS increasing drastically and One insert operation stood out which was running at the time of the observed spike in replication lag and it is having execution time of around 4.15 minutes (249425ms)
If a write request is sent to the primary, it will be replicated to the secondaries as quickly as possible.
The write will be acknowledged depending on our write concern setting. We have → “writeConcern”:“w”:“majority”
The insert query that showed up on the mongod logs for the primary node had below nature. Any recommendations?

“keysInserted”:5,“numYields”:0,“reslen”:230,
“locks”:{“ParallelBatchWriterMode”:{“acquireCount”:{“r”:2}},“FeatureCompatibilityVersion”:{“acquireCount”:{“w”:2}},
“ReplicationStateTransition”:{“acquireCount”:{“w”:3}},“Global”:{“acquireCount”:{“w”:2}},“Database”:{“acquireCount”:{“w”:2}},
“Collection”:{“acquireCount”:{“w”:2}},“Mutex”:{“acquireCount”:{“r”:2}}},“flowControl”:{“acquireCount”:1,“timeAcquiringMicros”:1},
“readConcern”:{“level”:“local”,“provenance”:“implicitDefault”},“writeConcern”:{“w”:“majority”,“wtimeout”:0,“provenance”:“implicitDefault”},
“waitForWriteConcernDurationMillis”:249425,“storage”:{“data”:
{“bytesRead”:8365,“timeReadingMicros”:17}},“remote”:“192.168.254.110:13601”,“protocol”:“op_msg”,“durationMillis”:249425}}