I’ve been using the MongoDB Connector for Kafka Connect for a while on a Kubernetes cluster (using the Strimzi operator for deployment/config). Until now all seems to have been working perfectly well… tbh it still is working well until I hit very high load. In this case I am seeing the distribution of messages across topic partitions to be uneven. I would say that 50% of the partitions are not really being utilised.
According the the Kafka Connect docs it is down to the producer to define the partitioner in use and I do not see a place this could be configured with the MongoDB connector.
So my question is this… is the connector using the DefaultPartitioner, and if so is it possible to force round-robin behaviour?
As an update I figured out that the normal Kafka producer config can be passed down to the connector specifying the desired partitioner (partitioner.class). However I am still seeing half of the partitions for a topic unused. As a test I manually published to one of the unused partitions which worked fine.
So the question remains… why would half of the partitions for a perfectly valid Kafka topic not be published to by the Kafka Connect connector?
I had thought that I was using the default class, however in the end I found an override that was using the round-robin partitioner. After dropping that and reverting to the defaults the distribution of messages was pretty even across the partitions.