Apply Schemas

On this page

Overview

Default Schemas
Key Schema
Value Schema
Schemas For Transformed Documents
Specify Schemas
Infer a Schema

Overview

In this guide, you can learn how to apply schemas to incoming documents in a MongoDB Kafka source connector.

There are two types of schema in Kafka Connect, key schema and value schema. Kafka Connect sends messages to Apache Kafka containing both your value and a key. A key schema enforces a structure for keys in messages sent to Apache Kafka. A value schema enforces a structure for values in messages sent to Apache Kafka.

Important

Note on Terminology

This guide uses the Apache Kafka definition of the word "key", which differs slightly from the BSON definition. In BSON, a "key" is a unique string identifier for a field in a document.

In Apache Kafka, a "key" is a byte array sent in a message used to determine what partition of a topic to write the message to. Kafka keys can be duplicates of other keys or null.

Specifying schemas in the connector is optional, and you can specify any of the following combinations of schemas:

Only a value schema
Only a key schema
Both a value and key schema
No schemas

Tip

Benefits of Schema

To see a discussion on the benefits of using schemas with Kafka Connect, see this article from Confluent.

If you want to send data through Apache Kafka with a specific data format, such as Apache Avro or JSON Schema, see the Converters guide.

To learn more about keys and values in Apache Kafka, see the official Apache Kafka introduction.

Default Schemas

The connector provides two default schemas:

A key schema for the _id field of MongoDB change event documents.
A value schema for MongoDB change event documents.

To learn more about change events, see our guide on change streams.

To learn more about default schemas, see the default schemas here in the MongoDB Kafka Connector source code.

Key Schema

The connector provides a default key schema for the _id field of change event documents. You should use the default key schema unless you remove the _id field from your change event document using either of the transformations described in this guide here.

If you specify either of these transformations and want to use a key schema for your incoming documents, you must specify a key schema as described in the specify a schema section of this guide.

You can enable the default key schema with the following option:

output.format.key=schema

Value Schema

The connector provides a default value schema for change event documents. You should use the default value schema unless you transform your change event documents as described in this guide here.

If you specify either of these transformations and want to use a value schema for your incoming documents, you must use one of the mechanisms described in the schemas for transformed documents section of this guide.

You can enable the default value schema with the following option:

output.format.value=schema

Schemas For Transformed Documents

There are two ways you can transform your change event documents in a source connector:

The publish.full.document.only=true option
An aggregation pipeline that modifies the structure of change event documents

If you transform your MongoDB change event documents, you must do the following to apply schemas:

Specify schemas
Have the connector infer a value schema

To learn more about the preceding configuration options, see the Change Stream Properties page.

Specify Schemas

You can specify schemas for incoming documents using Avro schema syntax. Click on the following tabs to see how to specify a schema for document values and keys:

output.format.key=schema
output.schema.key=<your avro schema>

output.format.value=schema
output.schema.value=<your avro schema>

To view an example that demonstrates how to specify a schema, see the Specify a Schema usage example.

To learn more about Avro Schema, see the Data Formats guide.

Important

Converters

If you want to send your data through Apache Kafka with Avro binary encoding, you must use an Avro converter. For more information, see the guide on Converters.

Infer a Schema

You can have your source connector infer a schema for incoming documents. This option works well for development and for data sources that do not frequently change structure, but for most production deployments we recommend that you specify a schema.

You can have the connector infer a schema by specifying the following options:

output.format.value=schema
output.schema.infer.value=true

The source connector can infer schemas for incoming documents that contain nested documents stored in arrays. Starting in Version 1.9 of the connector, schema inference will gather the appropriate data type for fields instead of defaulting to a string type assignment if there are differences between nested documents described by the following cases:

A field is present in one document but missing in another.
A field is present in one document but null in another.
A field is an array with elements of any type in one document but has additional elements or elements of other data types in another.
A field is an array with elements of any type in one document but an empty array in another.

If field types conflict between nested documents, the connector pushes the conflict down to the schema for the field and defaults to a string type assignment.

Note

Cannot Infer Key Schema

The connector does not support key schema inference. If you want to use a key schema and transform your MongoDB change event documents, you must specify a key schema as described in the specify schemas section of this guide.

Back

Change Streams

JSON Formatters