/ /

Data Formats

Overview

In this guide, you can learn about the data formats you use when working with the MongoDB Kafka Connector and your pipeline.

This guide uses the following sample document to show the behavior of the different formats:

{company:"MongoDB"}

JSON

JSON is a data-interchange format based on JavaScript object notation. You represent the sample document in JSON like this:

{"company":"MongoDB"}

You may encounter the following data formats related to JSON when working with the connector:

Raw JSON
BSON
JSON Schema

For more information on JSON, see the official JSON website.

Raw JSON

Raw JSON is a data format that consists of JSON objects written as strings. You represent the sample document in Raw JSON like this:

"{\"company\":\"MongoDB\"}"

You use Raw JSON when you specify a String converter on a source or sink connector. To view connector configurations that specify a String converter, see the Converters guide.

BSON

BSON is a binary serialization encoding for JSON-like objects. BSON encodes the sample document like this:

\x1a\x00\x00\x00\x02company\x00\x08\x00\x00\x00MongoDB\x00\x00

Your connectors use the BSON format to send and receive documents from your MongoDB deployment.

For more information on BSON, see the BSON specification.

JSON Schema

JSON Schema is a syntax for specifying schemas for JSON objects. A schema is a definition attached to an Apache Kafka Topic that defines valid values for that topic.

You can specify a schema for the sample document with JSON Schema like this:

{
   "$schema":"http://json-schema.org/draft-07/schema",
   "$id":"unique id",
   "type":"object",
   "title":"Example Schema",
   "description":"JSON Schema for the sample document.",
   "required":[
      "company"
   ],
   "properties":{
      "company":{
         "$id":"another unique id",
         "type":"string",
         "title":"Company",
         "description":"A field to hold the name of a company"
      }
   },
   "additionalProperties":false
}

You use JSON Schema when you apply JSON Schema converters to your connectors. To view connector configurations that specify a JSON Schema converter, see the Converters guide.

For more information, see the official JSON Schema website.

Avro

Apache Avro is an open-source framework for serializing and transporting data described by schemas. Avro defines two data formats relevant to the connector:

Avro schema
Avro binary encoding

For more information on Apache Avro, see the Apache Avro Documentation.

Avro Schema

Avro schema is a JSON-based schema definition syntax. Avro schema supports the specification of the following groups of data types:

Warning

Unsupported Avro Types

The connector does not support the following Avro types:

enum types. Use string instead.
fixed types. Use bytes instead.
null as a primitive type. However, null as an element in a union is supported.
union types with more than 2 elements.
union types with more than one null element.

Important

Sink Connectors and Logical Types

The MongoDB Kafka sink connector supports all Avro schema primitive and complex types, however sink connectors support only the following logical types:

decimal
date
time-millis
time-micros
timestamp-millis
timestamp-micros

You can construct an Avro schema for the sample document like this:

{
  "type": "record",
  "name": "example",
  "doc": "example documents have a company field",
  "fields": [
    {
      "name": "company",
      "type": "string"
    }
  ]
}

You use Avro schema when you define a schema for a MongoDB Kafka source connector.

For a list of all Avro schema types, see the Apache Avro specification.

Avro Binary Encoding

Avro specifies a binary serialization encoding for JSON objects defined by an Avro schema.

If you use the preceding Avro schema, you can represent the sample document with Avro binary encoding like this:

\x0eMongoDB

You use Avro binary encoding when you specify an Avro converter on a source or sink connector. To view connector configurations that specify an Avro converter, see the Converters guide.

To learn more about Avro binary encoding, see this section of the Avro specification.

Byte Arrays

A byte array is a consecutive sequence of unstructured bytes.

You can represent the sample document as a byte array using any of the encodings mentioned above.

You use byte arrays when your converters send data to or receive data from Apache Kafka. For more information on converters, see the Converters guide.

Back

Connect to MongoDB

Converters