Docs Home → MongoDB Kafka Connector
Apply Schemas
On this page
Overview
In this guide, you can learn how to apply schemas to incoming documents in a MongoDB Kafka Connector source connector.
There are two types of schema in Kafka Connect, key schema and value schema. Kafka Connect sends messages to Apache Kafka containing both your value and a key. A key schema enforces a structure for keys in messages sent to Apache Kafka. A value schema enforces a structure for values in messages sent to Apache Kafka.
Important
Note on Terminology
The word "key" has a slightly different meaning in the context of BSON and Apache Kafka. In BSON, a "key" is a unique string identifier for a field in a document.
In Apache Kafka, a "key" is a byte array sent in a message used to determine
what partition of a topic to write the message to. Kafka keys can be
duplicates of other keys or null
.
Specifying schemas in the MongoDB Kafka Connector is optional, and you can specify any of the following combinations of schemas:
Only a value schema
Only a key schema
Both a value and key schema
No schemas
Tip
Benefits of Schema
To see a discussion on the benefits of using schemas with Kafka Connect, see this article from Confluent.
If you want to send data through Apache Kafka with a specific data format, such as Avro or JSON Schema, see the Converters guide.
To learn more about keys and values in Apache Kafka, see the official Apache Kafka introduction.
Default Schemas
The MongoDB Kafka Connector provides two default schemas:
To learn more about change events, see our guide on change streams.
To learn more about default schemas, see the default schemas here in the MongoDB Kafka Connector source code.
Key Schema
The MongoDB Kafka Connector provides a default key schema for the _id
field of change
event documents. You should use the default key schema unless you remove the
_id
field from your change event document using either of the transformations
described in this guide here.
If you specify either of these transformations and want to use a key schema for your incoming documents, you must specify a key schema as described in the specify a schema section of this guide.
You can enable the default key schema with the following option:
output.format.key=schema
Value Schema
The MongoDB Kafka Connector provides a default value schema for change event documents. You should use the default value schema unless you transform your change event documents as described in this guide here.
If you specify either of these transformations and want to use a value schema for your incoming documents, you must use one of the mechanisms described in the schemas for transformed documents section of this guide.
You can enable the default value schema with the following option:
output.format.value=schema
Schemas For Transformed Documents
There are two ways you can transform your change event documents in a source connector:
The
publish.full.document.only=true
optionAn aggregation pipeline that modifies the structure of change event documents
If you transform your MongoDB change event documents, you must do the following to apply schemas:
To learn more about the preceding configuration options, see the Change Stream Properties page.
Specify Schemas
You can specify schemas for incoming documents using Avro schema syntax. Click on the following tabs to see how to specify a schema for document values and keys:
To view an example that demonstrates how to specify a schema, see the Specify a Schema usage example.
To learn more about Avro Schema, see the Data Formats guide.
Important
Converters
If you want to send your data through Apache Kafka with binary encoding, you must use an converter. For more information, see the guide on Converters.
Infer a Schema
You can have your source connector infer a schema for incoming documents. This option works well for development and for data sources that do not frequently change structure, but for most production deployments we recommend that you specify a schema.
You can have the MongoDB Kafka Connector infer a schema by specifying the following options:
output.format.value=schema output.schema.infer.value=true
Note
Cannot Infer Key Schema
The MongoDB Kafka Connector does not support key schema inference. If you want to use a key schema and transform your MongoDB change event documents, you must specify a key schema as described in the specify schemas section of this guide.