Copy Existing Data

This version of the documentation is archived and no longer supported. View the current documentation to learn how to upgrade your version of the MongoDB Kafka Connector.

This usage example demonstrates how to copy data from a MongoDB collection to an Apache Kafka topic using the MongoDB Kafka Connector.

Example

Suppose you need to copy a MongoDB collection to Apache Kafka and filter some of the data.

Your requirements and your solutions are as follows:

Requirement	Solution
Copy the `customers` collection of the `shopping` database in your MongoDB deployment onto an Apache Kafka topic.	See the Copy Data section of this guide.
Only copy documents that have the value "Mexico" in the `country` field.	See the Filter Data section of this guide.

The customers collection contains the following documents:

{
  "_id": 1,
  "country": "Mexico",
  "purchases": 2,
  "last_viewed": { "$date": "2021-10-31T20:30:00.245Z" }
}
{
  "_id": 2,
  "country": "Iceland",
  "purchases": 8,
  "last_viewed": { "$date": "2015-07-20T10:00:00.135Z" }
}

Copy Data

Copy the contents of the customers collection of the shopping database by specifying the following configuration options in your source connector:

database=shopping
collection=customers
copy.existing=true

Your source connector copies your collection by creating change event documents that describe inserting each document into your collection.

Note

Data Copy Can Produce Duplicate Events

If any system changes the data in the database while the source connector converts existing data from it, MongoDB may produce duplicate change stream events to reflect the latest changes. Since the change stream events on which the data copy relies are idempotent, the copied data is eventually consistent.

To learn more about change event documents, see the Change Streams guide.

To learn more about the copy.existing option, see Copy Existing Properties in the MongoDB Kafka Connector.

Filter Data

You can filter data by specifying an aggregation pipeline in the copy.existing.pipeline option of your source connector configuration. The following configuration specifies an aggregation pipeline that matches all documents with "Mexico" in the country field:

copy.existing.pipeline=[{ "$match": { "country": "Mexico" } }]

To learn more about the copy.existing.pipeline option, see Copy Existing Properties in the MongoDB Kafka Connector.

To learn more about aggregation pipelines, see the following resources:

Customize a Pipeline to Filter Change Events Usage Example
Aggregation in the MongoDB manual.

Specify the Configuration

Your source connector configuration to copy the customers collection should look like this:

connector.class=com.mongodb.kafka.connect.MongoSourceConnector
connection.uri=<your production MongoDB connection uri>
database=shopping
collection=customers
copy.existing=true
copy.existing.pipeline=[{ "$match": { "country": "Mexico" } }]

Once your connector copies your data, you see the following change event document corresponding to the preceding sample collection in the shopping.customers Apache Kafka topic:

{
  "_id": { "_id": 1, "copyingData": true },
  "operationType": "insert",
  "documentKey": { "_id": 1 },
  "fullDocument": {
    "_id": 1,
    "country": "Mexico",
    "purchases": 2,
    "last_viewed": { "$date": "2021-10-31T20:30:00.245Z" }
  },
  "ns": { "db": "shopping", "coll": "customers" }
}

Note

Write the Data in your Topic into a Collection

Use a change data capture handler to convert change event documents in an Apache Kafka topic into MongoDB write operations. To learn more, see the Change Data Capture Handlers guide.

← Topic Naming Specify a Schema →

Copy Existing Data.css-134mg1q{-webkit-align-self:center;-ms-flex-item-align:center;align-self:center;padding:0 10px;visibility:hidden;}.css-6vrlzm{border-radius:0!important;display:initial!important;margin:initial!important;}.css-1l4s55v{margin-top:-175px;position:absolute;padding-bottom:2px;}

Example

Copy Data

Note

Data Copy Can Produce Duplicate Events

Filter Data

Specify the Configuration

Note

Write the Data in your Topic into a Collection

Copy Existing Data