Sink Connector Post Processors

On this page

Overview
How Post Processors Modify Data
How to Specify Post Processors
Prebuilt Post Processors
Configure the Document Id Adder Post Processor
Post Processor Examples
Allow List and Block List Examples
Allow List Projector Example
Block List Projector Example
Projection Wildcard Pattern Matching Examples
Field Renaming Examples
How to Create a Custom Post Processor

Overview

On this page, you can learn how to configure post processors in your MongoDB Kafka sink connector. Post processors modify sink records that the connector reads from a Kafka topic before the connector stores it in your MongoDB collection. A few examples of data modifications post processors can make include:

Set the document _id field to a custom value
Include or exclude message key or value fields
Rename fields

You can use the prebuilt post processors included in the connector or implement your own.

See the following sections for more information on post processors:

How Post Processors Modify Data
How to Specify Post Processors
Prebuilt Post Processors
Configure the Document Id Adder Post Processor
Post Processor Configuration Examples
Create a Custom Post Processor

How Post Processors Modify Data

Post processors modify data read from a Kafka topic. The connector stores the message in a SinkDocument class which contains a representation of the Kafka SinkRecord key and value fields. The connector sequentially applies any post processors specified in the configuration and stores the result in a MongoDB collection.

Post processors perform data modification tasks such as generating the document _id field, projecting message key or value fields, and renaming fields. You can use the prebuilt post processors included in the connector, or you can implement your own by extending the PostProcessor class.

Important

Post Processors and Change Data Capture (CDC) Handlers

You cannot apply a post processor to CDC handler event data. If you specify both, the connector logs a warning.

How to Specify Post Processors

You can specify one or more post processors in the post.processor.chain configuration setting as a comma-separated list. If you specify more than one, the connector applies them sequentially in which each post processor modifies the data output by the prior one.

To ensure the documents the connector writes to MongoDB contain unique _id fields, it automatically adds the DocumentIdAdder post processor in the first position of the chain if you do not otherwise include it.

The following example setting specifies that the connector should run the KafkaMetaAdder post processor first and then the AllowListValueProjector post processor on the output.

post.processor.chain=com.mongodb.kafka.connect.sink.processor.KafkaMetaAdder,com.mongodb.kafka.connect.sink.processor.AllowListValueProjector

Prebuilt Post Processors

The following table contains a list of all the post processors included in the sink connector.

Post Processor Name

Description

DocumentIdAdder

Full Path:

com.mongodb.kafka.connect.sink.processor.DocumentIdAdder

Inserts an _id field determined by the configured strategy.

The default strategy is BsonOidStrategy.

For information on strategy options and configuration, see the Configure the Document Id Adder Post Processor section.

BlockListKeyProjector

Full Path:

com.mongodb.kafka.connect.sink.processor.BlockListKeyProjector

Removes matching key fields from the sink record.

For more information on configuration, see the Allow List and Block List Examples.

BlockListValueProjector

Full Path:

com.mongodb.kafka.connect.sink.processor.BlockListValueProjector

Removes matching value fields from the sink record.

For more information on configuration, see the Allow List and Block List Examples.

AllowListKeyProjector

Full Path:

com.mongodb.kafka.connect.sink.processor.AllowListKeyProjector

Includes only matching key fields from the sink record.

For more information on configuration, see the Allow List and Block List Examples.

AllowListValueProjector

Full Path:

com.mongodb.kafka.connect.sink.processor.AllowListValueProjector``

Includes only matching value fields from the sink record.

For more information on configuration, see the Allow List and Block List Examples.

KafkaMetaAdder

Full Path:

com.mongodb.kafka.connect.sink.processor.KafkaMetaAdder

Adds a field named "topic-partition-offset" and sets the value to the concatenation of Kafka topic, partition, and offset to the document.

RenameByMapping

Full Path:

com.mongodb.kafka.connect.sink.processor.field.renaming.RenameByMapping

Renames fields that are an exact match to a specified field name in the key or value document.

For information on configuration, see the Renaming by Mapping Example.

RenameByRegex

Full Path:

com.mongodb.kafka.connect.sink.processor.field.renaming.RenameByRegex

Renames fields that match a regular expression in the key or value document.

For information on configuration, see the Renaming by Regular Expression Example.

NullFieldValueRemover

Full Path:

com.mongodb.kafka.connect.sink.processor.NullFieldValueRemover``

Removes all document fields that contain null values from the sink record.

Configure the Document Id Adder Post Processor

The DocumentIdAdder post processor uses a strategy to determine how it should format the _id field in the MongoDB document. A strategy defines preset behavior that you can customize for your use case.

You can specify a strategy for this post processor in the document.id.strategy setting as shown in the following example:

document.id.strategy=com.mongodb.kafka.connect.sink.processor.id.strategy.UuidStrategy

The following table shows a list of the strategies you can use to configure the DocumentIdAdder post processor:

Strategy Name

Description

BsonOidStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.BsonOidStrategy

Generates a MongoDB BSON ObjectId.

Default strategy for the DocumentIdAdder post processor.

KafkaMetaDataStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.KafkaMetaDataStrategy

Builds a string composed of the concatenation of Kafka topic, partition, and offset.

FullKeyStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.FullKeyStrategy

Uses the complete key structure of the sink document to generate the value for the _id field.

Defaults to a blank document if no key exists.

ProvidedInKeyStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInKeyStrategy

Uses the _id field specified in the key structure of the sink document.

Throws an exception if the field is missing from the sink document.

ProvidedInValueStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.ProvidedInValueStrategy

Uses the _id field specified in the value structure of the sink document.

Throws an exception if the field is missing from the sink document.

PartialKeyStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.PartialKeyStrategy

Uses a block list or allow list projection of the sink document key structure.

Defaults to a blank document if no key exists.

PartialValueStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy

Uses a block list or allow list projection of the sink document value structure.

Defaults to a blank document if no value exists.

UuidProvidedInKeyStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.UuidInKeyStrategy

Converts the _id key field to a UUID. The value must be either a string or binary type and must conform to the UUID format.

UuidProvidedInValueStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.UuidInValueStrategy

Converts the _id value field to a UUID. The value must be either a string or binary type and must conform to the UUID format.

UuidStrategy

Full Path:

com.mongodb.kafka.connect.sink.processor.id.strategy.UuidStrategy``

Uses a randomly generated UUID in string format.

Create a Custom Document Id Strategy

If the built-in document id adder strategies do not cover your use case, you can define a custom document id strategy by following the steps below:

Create a Java class that implements the interface IdStrategy and contains your custom configuration logic.
Compile the class to a JAR file.
Add the compiled JAR to the class path / plugin path for all your Kafka workers. For more information about plugin paths, see the Confluent documentation.
Update the document.id.strategy setting to the full class name of your custom class in all your Kafka workers.

Note

Selected strategy may have implications on delivery semantics

BSON ObjectId or UUID strategies can only guarantee at-least-once delivery since the connector generates new ids on retries or when processing records again. Other strategies permit exactly-once delivery if you can guarantee the fields that form the document id are unique.

For example implementations of the IdStrategy interface, see the source code directory that contains id strategy implementations packaged with the connector.

Post Processor Examples

This section shows examples of configuration and sample output of the following types of post processors:

Allow List and Block List
Projection Wildcard Pattern Matching
Field Renaming

Allow List and Block List Examples

The allow list and block list projector post processors determine which fields to include and exclude from the output.

When you use the allow list projector, the post processor only outputs data from the fields that you specify.

When you use the block list projector, the post process only omits data from the fields that you specify.

Note

You can use the "." (dot) notation to reference nested fields in the record. You can also use the notation to reference fields of documents in an array.

When you add a projector to your post processor chain, you must specify the projector type and whether to apply it to the key or value portion of the sink document.

See the following sections for example projector configurations and output.

Allow List Projector Example

Suppose your Kafka record value documents resembled the following user profile data:

{
  "name": "Sally Kimball",
  "age": 10,
  "address": {
    "city": "Idaville",
    "country": "USA"
  },
  "hobbies": [
    "reading",
    "solving crime"
  ]
}

You can configure the AllowList value projector to store select data such as the "name", "address.city", and "hobbies" fields from your value documents using the following settings:

post.processor.chain=com.mongodb.kafka.connect.sink.processor.AllowListValueProjector
value.projection.type=AllowList
value.projection.list=name,address.city,hobbies

After the post processor applies the projection, it outputs the following record:

{
  "name": "Sally Kimball",
  "address": {
    "city": "Idaville"
  },
  "hobbies": [
    "reading",
    "solving crime"
  ]
}

Block List Projector Example

Suppose your Kafka record key documents resembled the following user identification data:

{
  "username": "user5983",
  "registration": {
    "date": "2021-09-13",
    "source": "mobile"
  },
  "authToken": {
    "alg": "HS256",
    "type": "JWT",
    "payload": "zI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODk"
  }
}

You can configure the BlockList key projector to omit the "authToken" and "registration.source" fields before storing the data with the following settings:

post.processor.chain=com.mongodb.kafka.connect.sink.processor.BlockListKeyProjector
key.projection.type=BlockList
key.projection.list=authToken,registration.source

After the post processor applies the projection, it outputs the following record:

{
  "username": "user5983",
  "registration": {
    "date": "2021-09-13",
  }
}

Projection Wildcard Pattern Matching Examples

This section shows how you can configure the projector post processors to match wildcard patterns to match field names.

Pattern	Description
`*`	Matches any number of characters in the current level.
`**`	Matches any characters in the current level and all nested levels.

For the allow list and block list wildcard pattern matching examples in this section, refer to the following value document that contains weather measurements:

{
  "city": "Springfield",
  "temperature": {
    "high": 28,
    "low": 24,
    "units": "C"
  },
  "wind_speed_10m": {
    "average": 3,
    "units": "km/h"
  },
  "wind_speed_80m": {
    "average": 8,
    "units": "km/h"
  },
  "soil_conditions": {
    "temperature": {
      "high": 22,
      "low": 17,
      "units": "C"
    },
    "moisture": {
      "average": 340,
      "units": "mm"
    }
  }
}

Allow List Wildcard Examples

You can use the * wildcard to match multiple field names. The following example configuration matches the following fields:

The top-level field named "city"
The fields named "average" that are subdocuments of any top-level field that starts with the name "wind_speed".

post.processor.chain=com.mongodb.kafka.connect.sink.processor.AllowListValueProjector
value.projection.type=AllowList
value.projection.list=city,wind_speed*.average

After the post processor applies the allow list projection, it outputs the following record:

{
  "city": "Springfield",
  "wind_speed_10m": {
    "average": 3,
  },
  "wind_speed_80m": {
    "average": 8,
  }
}

You can use the ** wildcard which matches objects at any level starting from the one at which you specify the wildcard. The following wildcard matching example projects any document that contains the field named "low".

post.processor.chain=com.mongodb.kafka.connect.sink.processor.AllowListValueProjector
value.projection.type=AllowList
value.projection.list=**.low

The post processor that applies the projection outputs the following record:

{
  "temperature": {
    "high": 28,
    "low": 24,
    "units": "C"
  },
  "soil_conditions": {
    "temperature": {
      "high": 22,
      "low": 17,
      "units": "C"
    }
  }
}

Block List Wildcard Example

You can use the wildcard patterns to match fields at a specific document level as shown in the following block list configuration example:

post.processor.chain=com.mongodb.kafka.connect.sink.processor.BlockListValueProjector
value.projection.type=BlockList
value.projection.list=*.*.temperature

{
  "city": "Springfield",
  "temperature": {
    "high": 28,
    "low": 24,
    "units": "C"
  },
  "wind_speed_10m": {
    "average": 3,
    "units": "km/h"
  },
  "wind_speed_80m": {
    "average": 8,
    "units": "km/h"
  },
  "soil_conditions": {
    "moisture": {
      "average": 340,
      "units": "mm"
    }
  }
}

Field Renaming Examples

This section shows how you can configure the RenameByMapping and RenameByRegex field renamer post processors to update field names in a sink record. The field renaming settings specify the following:

Whether to update the key or value document in the record
The field names to update
The new field names

You must specify RenameByMapping and RenameByRegex settings in a JSON array. You can specify nested fields by using either dot notation or pattern matching.

The field renamer post processor examples use the following example sink record:

Key Document

{
  "location": "Provence",
  "date_month": "October",
  "date_day": 17
}

Value Document

{
  "flapjacks": {
    "purchased": 598,
    "size": "large"
  }
}

Rename by Mapping Example

The RenameByMapping post processor setting specifies one or more JSON objects that assign fields matching a string to a new name. Each object contains the text to match in the oldName element and the replacement text in the newName element as described in the table below.

Key Name	Description
oldName	Specifies whether to match fields in the key or value document and the field name to replace. The setting uses a "." character to separate the two values.
newName	Specifies the replacement field name for all matches of the field.

The following example property matches the "location" field of a key document and renames it to "country":

field.renamer.mapping=[{"oldName":"key.location", "newName":"country"}]

This setting instructs the RenameByMapping post processor to transform the original key document to the following document:

{
  "country": "Provence",
  "date_month": "October",
  "date_day": 17
}

You can perform a similar field name assignment for value documents by specifying the value document with the appended field name in the oldName field as follows:

field.renamer.mapping=[{"oldName":"value.flapjacks", "newName":"crepes"}]

This setting instructs the RenameByMapping post processor to transform the original value document to the following document:

{
  "crepes": {
    "purchased": 598,
    "size": "large"
  }
}

You can also specify one or more mappings in the field.renamer.mapping property by using a JSON array in string format as shown in the following setting:

field.renamer.mapping=[{ "oldName":"key.location", "newName":"city" }, { "oldName":"value.crepes", "newName":"flapjacks" }]

Rename by Regular Expression

The RenameByRegex post processor setting specifies the field names and text patterns that it should match, and replacement values for the matched text. You can specify one or more renaming expressions in JSON objects containing the fields described in the following table:

Key Name	Description
regexp	Contains a regular expression that matches fields to perform the replacement.
pattern	Contains a regular expression that matches on the text to replace.
replace	Contains the replacement text for all matches of the regular expression you defined in the `pattern` field.

The following example setting instructs the post processor to perform the following:

Match any field names in the key document that start with "date". In the set of matching fields, replace all text that matches the pattern _ with the - character.
Match any field names in the value document that are subdocuments of crepes. In the set of matching fields, replace all text that matches the pattern purchased with quantity.

field.renamer.regexp=[{"regexp":"^key\\.date.*$","pattern":"_","replace":"-"}, {"regexp":"^value\\.crepes\\..*","pattern":"purchased","replace":"quantity"}]

When the connector applies the post processor to the example key document and the example value document, it outputs the following:

Key Document

{
  "location": "Provence",
  "date-month": "October",
  "date-day": 17
}

Value Document

{
  "crepes": {
    "quantity": 598,
    "size": "large"
  }
}

Warning

The renamer post processors do not overwrite existing field names

The target field names you set in your renamer post processors to may result in duplicate field names in the same document. To avoid this, the post processor skips renaming when it would duplicate an existing field name at the same level of the document.

How to Create a Custom Post Processor

If the built-in post processors do not cover your use case, you can create a custom post processor class using the following steps:

Create a Java class that extends the PostProcessor abstract class.
Override the process() method in your class. You can update the SinkDocument, a BSON representation of the sink record key and value fields, and access the original Kafka SinkRecord in your method.
Compile the class to a JAR file.
Add the compiled JAR to the class path / plugin path for all your Kafka workers. For more information about plugin paths, see the Confluent documentation on Manually Installing Community Connectors.
Add your post processor full class name to the post processor chain configuration.

For example post processors, you can browse the source code for the built-in post processor classes.

Back

Write Model Strategies

Error Handling