MongoDB Kafka Sink Connector w/ Debezium CDC Handler Fails on Update Operations

Hi @Paulo_Henrique_Favero_Pereira

First of all thanks for your detailed information around your question / challenge. Based on the things I know, I can tell you right away that there doesn’t seem to be a “straight-forward one sentence solution” :slight_smile: However I try to highlight a few things that caught my eye and point you to some potential workarounds:

  1. From what I can tell you are using the debezium 2.0 mongo source connector. Under the covers this connector uses mongodb’s changestreams feature as well. The problem is that debezium changed the actual CDC event payload format which was in fact a “breaking change” when you’d compare it with the CDC payload format used until version 1.7 of the connector.

  2. That being said the MongoDB sink connector is currently not prepared to properly deal with this new event payload format from Debezium, neither the official connector from MongoDB, nor my community sink connector which was integrated into the official at some point in the past (back then feature parity). Even if it doesn’t fix the issue, I’d highly recommend you switch to the official MongoDB connector in your docker file instead of using my community sink connector - what’s even more strange to me is the fact that if you want to rely on my community version, you shouldn’t use tag 1.2.0 which points to an even older version. The latest version of my community sink was tagged 1.4.0.

Coming back to the actual problem and the breaking change in the event payload format you might have the following options:

a) you could try to move away from DBZ source and maybe find a way to get your use case working based on the official source connector - I can’t tell if that will work because I don’t know enough details about your use case / requirements. It could be that you need tombstones events which depending on the capture mode aren’t supported in the official mongo source if I’m not mistaken.

b) if you want to stick to debezium source connector, you might get away with using version 1.9 which still allows to configure the “legacy oplog” based CDC and which produces the “old” and AFAIK still compatible CDC event payloads for the sink connector.

c) if neither a) nor b) work for your case and you want to continue using the DBZ 2.0 source connector I’m afraid you need to take some of the following actions to get this solved:

  • apply an SMT which changes the CDC payload structure of update events accordingly so that the sink connector can work with it
  • or instead use a stream processing job (kstreams or ksqlDB query) to do the same that an SMT would do otherwise

Anyway, I think it’s good that you reported this issue and raised awareness. Since it’s not trivial to work around this problem I hope that someone will update the MongoDB sink connector’s CDC handler for Debezium MongoDB so that it is capable to process the new event payload format.

I hope this helps you. Feel free to comment or ask again if anything is unclear.

THX!