Data Formats
Overview
In this guide, you can learn about the data formats you use when working with the MongoDB Kafka Connector and your pipeline.
This guide uses the following sample document to show the behavior of the different formats:
{company:"MongoDB"}
JSON
JSON is a data-interchange format based on JavaScript object notation. You represent the sample document in JSON like this:
{"company":"MongoDB"}
You may encounter the following data formats related to JSON when working with the connector:
For more information on JSON, see the official JSON website.
Raw JSON
Raw JSON is a data format that consists of JSON objects written as strings. You represent the sample document in Raw JSON like this:
"{\"company\":\"MongoDB\"}"
You use Raw JSON when you specify a String converter on a source or sink connector. To view connector configurations that specify a String converter, see the Converters guide.
BSON
BSON is a binary serialization encoding for JSON-like objects. BSON encodes the sample document like this:
\x1a\x00\x00\x00\x02company\x00\x08\x00\x00\x00MongoDB\x00\x00
Your connectors use the BSON format to send and receive documents from your MongoDB deployment.
For more information on BSON, see the BSON specification.
JSON Schema
JSON Schema is a syntax for specifying schemas for JSON objects. A schema is a definition attached to an Apache Kafka Topic that defines valid values for that topic.
You can specify a schema for the sample document with JSON Schema like this:
{ "$schema":"http://json-schema.org/draft-07/schema", "$id":"unique id", "type":"object", "title":"Example Schema", "description":"JSON Schema for the sample document.", "required":[ "company" ], "properties":{ "company":{ "$id":"another unique id", "type":"string", "title":"Company", "description":"A field to hold the name of a company" } }, "additionalProperties":false }
You use JSON Schema when you apply JSON Schema converters to your connectors. To view connector configurations that specify a JSON Schema converter, see the Converters guide.
For more information, see the official JSON Schema website.
Avro
Apache Avro is an open-source framework for serializing and transporting data described by schemas. Avro defines two data formats relevant to the connector:
For more information on Apache Avro, see the Apache Avro Documentation.
Avro Schema
Avro schema is a JSON-based schema definition syntax. Avro schema supports the specification of the following groups of data types:
Warning
Unsupported Avro Types
The connector does not support the following Avro types:
enum
types. Usestring
instead.fixed
types. Usebytes
instead.null
as a primitive type. However,null
as an element in aunion
is supported.union
types with more than 2 elements.union
types with more than onenull
element.
Important
Sink Connectors and Logical Types
The MongoDB Kafka sink connector supports all Avro schema primitive and complex types, however sink connectors support only the following logical types:
decimal
date
time-millis
time-micros
timestamp-millis
timestamp-micros
You can construct an Avro schema for the sample document like this:
{ "type": "record", "name": "example", "doc": "example documents have a company field", "fields": [ { "name": "company", "type": "string" } ] }
You use Avro schema when you define a schema for a MongoDB Kafka source connector.
For a list of all Avro schema types, see the Apache Avro specification.
Avro Binary Encoding
Avro specifies a binary serialization encoding for JSON objects defined by an Avro schema.
If you use the preceding Avro schema, you can represent the sample document with Avro binary encoding like this:
\x0eMongoDB
You use Avro binary encoding when you specify an Avro converter on a source or sink connector. To view connector configurations that specify an Avro converter, see the Converters guide.
To learn more about Avro binary encoding, see this section of the Avro specification.
Byte Arrays
A byte array is a consecutive sequence of unstructured bytes.
You can represent the sample document as a byte array using any of the encodings mentioned above.
You use byte arrays when your converters send data to or receive data from Apache Kafka. For more information on converters, see the Converters guide.