Hi All,
We have 2 QR and 2 node RS config and 3 shards with 3 node RS each.
We are running in 4.4.22 version.
We often see replication lag in all shards for about 25 sec for few secs and the frequency is high.
It is happening in all shards and often the primary seems down in percona for 20sec but in mongo logs if we check there won’t be any significant information related to the node has restarted or gone down for a while.
It is so confusing to check why is it happening so.
Initial analysis through percona we could see query execution time,command operations and few metrics take a spike .In logs we couln’t find anything else apart from few long running queries for 10 or 20s.
Have anyone experience this scheario . It would be great if I get some insights