Apply Schemas
On this page
Overview
In this guide, you can learn how to apply schemas to incoming documents in a MongoDB Kafka source connector.
There are two types of schema in Kafka Connect, key schema and value schema. Kafka Connect sends messages to Apache Kafka containing both your value and a key. A key schema enforces a structure for keys in messages sent to Apache Kafka. A value schema enforces a structure for values in messages sent to Apache Kafka.
Important
Note on Terminology
This guide uses the Apache Kafka definition of the word "key", which differs slightly from the BSON definition. In BSON, a "key" is a unique string identifier for a field in a document.
In Apache Kafka, a "key" is a byte array sent in a message used to determine
what partition of a topic to write the message to. Kafka keys can be
duplicates of other keys or null
.
Specifying schemas in the connector is optional, and you can specify any of the following combinations of schemas:
Only a value schema
Only a key schema
Both a value and key schema
No schemas
Tip
Benefits of Schema
To see a discussion on the benefits of using schemas with Kafka Connect, see this article from Confluent.
If you want to send data through Apache Kafka with a specific data format, such as Apache Avro or JSON Schema, see the Converters guide.
To learn more about keys and values in Apache Kafka, see the official Apache Kafka introduction.
Default Schemas
The connector provides two default schemas:
To learn more about change events, see our guide on change streams.
To learn more about default schemas, see the default schemas here in the MongoDB Kafka Connector source code.
Key Schema
The connector provides a default key schema for the _id
field of change
event documents. You should use the default key schema unless you remove the
_id
field from your change event document using either of the transformations
described in this guide here.
If you specify either of these transformations and want to use a key schema for your incoming documents, you must specify a key schema as described in the specify a schema section of this guide.
You can enable the default key schema with the following option:
output.format.key=schema
Value Schema
The connector provides a default value schema for change event documents. You should use the default value schema unless you transform your change event documents as described in this guide here.
If you specify either of these transformations and want to use a value schema for your incoming documents, you must use one of the mechanisms described in the schemas for transformed documents section of this guide.
You can enable the default value schema with the following option:
output.format.value=schema
Schemas For Transformed Documents
There are two ways you can transform your change event documents in a source connector:
The
publish.full.document.only=true
optionAn aggregation pipeline that modifies the structure of change event documents
If you transform your MongoDB change event documents, you must do the following to apply schemas:
To learn more about the preceding configuration options, see the Change Stream Properties page.
Specify Schemas
You can specify schemas for incoming documents using Avro schema syntax. Click on the following tabs to see how to specify a schema for document values and keys:
output.format.key=schema output.schema.key=<your avro schema>
output.format.value=schema output.schema.value=<your avro schema>
To view an example that demonstrates how to specify a schema, see the Specify a Schema usage example.
To learn more about Avro Schema, see the Data Formats guide.
Important
Converters
If you want to send your data through Apache Kafka with Avro binary encoding, you must use an Avro converter. For more information, see the guide on Converters.
Infer a Schema
You can have your source connector infer a schema for incoming documents. This option works well for development and for data sources that do not frequently change structure, but for most production deployments we recommend that you specify a schema.
You can have the connector infer a schema by specifying the following options:
output.format.value=schema output.schema.infer.value=true
The source connector can infer schemas for incoming documents that
contain nested documents stored in arrays. Starting in Version 1.9 of the
connector, schema inference will gather the appropriate data type
for fields instead of defaulting to a string
type assignment if there are
differences between nested documents described by the following cases:
A field is present in one document but missing in another.
A field is present in one document but
null
in another.A field is an array with elements of any type in one document but has additional elements or elements of other data types in another.
A field is an array with elements of any type in one document but an empty array in another.
If field types conflict between nested documents, the connector
pushes the conflict down to the schema for the field and defaults to a
string
type assignment.
Note
Cannot Infer Key Schema
The connector does not support key schema inference. If you want to use a key schema and transform your MongoDB change event documents, you must specify a key schema as described in the specify schemas section of this guide.