Kafka Connector task spawn strategy

Hi @Hamid_Jawaid,

tasks.max - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism.
https://docs.confluent.io/current/connect/managing/configuring.html#configuring-connectors

Great questions, the answer is the MongoDB Kafka Connector itself isn’t responsible for managing the task or the number of tasks. All it does is take the tasks.max value and create a number of configurations for each task. This allows the connector to determine how many tasks it can support. Prior to 1.2.0 the connector would only ever allow a single task. In 1.2.0 we now allow multiple tasks and Kafka Connect will then manage how many tasks to run in parallel.

The exact algorithm is internal to Kafka-Connect but it generally relates to the number of partitions and topics. So for example if you set tasks.max = 10 and have the following sink connector configuration:

  • 1 topic, 1 partition - then Kafka connect will only spawn a single task
  • 2 topics, 1 partition each - then Kafka connect will spawn 2 tasks, 1 for each topic
  • 2 topics, 5 partitions each - then Kafka connection will spawn 10 tasks, 1 for each topic partition
  • 4 topics, 5 partitions each - the Kafka connection will spawn 10 tasks, each handling data from 2 topic partitions.

The How to Use Kafka Connect - Get Started | Confluent Documentation alludes to this, but as far as the MongoDB Kafka Connector is concerned, it will just process the data it is handed by Kafka Connect.

I hope that helps answer your questions,

Ross

1 Like