MongoDB ingestion rate unexpected behavior

Louis_Ponce · July 3, 2024, 7:41am

Hi,
I am storing data inside a timeseries collection on a basic 3 node replica set.
The data is coming from a kafka topic at a low rate of 1000 messages / sec.
I have a Kafka - Mongo sink connector running this config:

{
  "name": "mongo-sink",
  "config" : {
    "connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
    "tasks.max": "1",
    "topics": "mytopic",
    "connection.uri": "mongodb://user:password@rs-1-1:27017,rs-1-2:27017,rs-1-3:27017/?replicaSet=rs-1&w=1&appName=mongosh+2.2.5", 
    "database": "mydb",
    "collection": "mycol",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "false",
    "document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy",
    "document.id.strategy.overwrite.existing": "true",
    "writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.InsertOneDefaultStrategy",
    "delete.on.null.values": "false",
    "timeseries.timefield": "_timestamp",
    "timeseries.metafield": "_metadata",
    "timeseries.timefield.auto.convert": true,
    "errors.tolerance":"all",
    "max.batch.size": "10000",
    "batch.size": "2000"
  }
}

I have tried a few different configurations (more tasks + smaller batches and less task with bigger batches). In every case i find myself in the next situation :

The ingestion rate is regular during a few minutes, and then start to go all over the place
The ingestion works in small spikes but can keep up with new data incoming into kafka, as suggested by the kafka consumer offset continuously falling behind the current topic offset.

Here is a screenshot of my InfluxDB monitoring dashboard of mongoDB.

The graph is split into 3 parts :

No data is produced at first
Data start to be produce correctly with regular rates (middle part)
Data ingestion abruptly gets unstable. Write locks, commands, write latency, and CPU starts to go up or have an irregular profiles

more infos :

When the last part starts, the primary mongoDB instance uses 100% of a thread, suggesting a CPU bottleneck
The disk I/O is very low. no bottleneck here
RAM : 128Go, 48 CPU, mongoDB instances are docker containers with a limit of 30Go of RAM each
Indices are around 200 Mb

There are my multiple questions:

I understand that mongo has some kind of non-parallel + locking write mechanism. If true, is this supposed to have such an impact on this kind of work ?
Is mongoDB running background tasks (in general) that may lock the ingestion process during its work? If yes, how come they happen this abruptly?
Considering the low ingest rate (1000 mess/sec) is this normal? should i use a sharded cluster to have multi-threaded write?
other clues?

Thank you for your time. I can provide more info if needed.