Copy Existing Data

This usage example demonstrates how to copy data from a MongoDB collection to an Apache Kafka topic using the MongoDB Kafka source connector.

Example

Suppose you want to copy a MongoDB collection to Apache Kafka and filter some data.

Your requirements and your solutions are as follows:

Requirement	Solution
Copy the `customers` collection of the `shopping` database in your MongoDB deployment onto an Apache Kafka topic.	See the Copy Data section of this guide.
Only copy documents that have the value "Mexico" in the `country` field.	See the Filter Data section of this guide.

The customers collection contains the following documents:

{
  "_id": 1,
  "country": "Mexico",
  "purchases": 2,
  "last_viewed": { "$date": "2021-10-31T20:30:00.245Z" }
}
{
  "_id": 2,
  "country": "Iceland",
  "purchases": 8,
  "last_viewed": { "$date": "2015-07-20T10:00:00.135Z" }
}

Copy Data

Copy the contents of the customers collection of the shopping database by specifying the following configuration options in your source connector:

database=shopping
collection=customers
startup.mode=copy_existing

Your source connector copies your collection by creating change event documents that describe inserting each document into your collection.

Note

Data Copy Can Produce Duplicate Events

If any system changes the data in the database while the source connector converts existing data from it, MongoDB may produce duplicate change stream events to reflect the latest changes. Since the change stream events on which the data copy relies are idempotent, the copied data is eventually consistent, conformant to an "at-least-once" delivery guarantee.

To learn more about change event documents, see the Change Streams guide.

To learn more about the startup.mode option, see Startup Properties.

Filter Data

You can filter data by specifying an aggregation pipeline in the startup.mode.copy.existing.pipeline option of your source connector configuration. The following configuration specifies an aggregation pipeline that matches all documents with "Mexico" in the country field:

startup.mode.copy.existing.pipeline=[{ "$match": { "country": "Mexico" } }]

To learn more about the startup.mode.copy.existing.pipeline option, see Startup Properties.

To learn more about aggregation pipelines, see the following resources:

Customize a Pipeline to Filter Change Events Usage Example
Aggregation in the MongoDB manual.

Specify the Configuration

Your final source connector configuration to copy the customers collection should look like this:

connector.class=com.mongodb.kafka.connect.MongoSourceConnector
connection.uri=<your production MongoDB connection uri>
database=shopping
collection=customers
startup.mode=copy_existing
startup.mode.copy.existing.pipeline=[{ "$match": { "country": "Mexico" } }]

Once your connector copies your data, you see the following change event document corresponding to the preceding sample collection in the shopping.customers Apache Kafka topic:

{
  "_id": { "_id": 1, "copyingData": true },
  "operationType": "insert",
  "documentKey": { "_id": 1 },
  "fullDocument": {
    "_id": 1,
    "country": "Mexico",
    "purchases": 2,
    "last_viewed": { "$date": "2021-10-31T20:30:00.245Z" }
  },
  "ns": { "db": "shopping", "coll": "customers" }
}

Note

Write the Data in your Topic into a Collection

Use a change data capture handler to convert change event documents in an Apache Kafka topic into MongoDB write operations. To learn more, see the Change Data Capture Handlers guide.

Back

Topic Naming

Specify a Schema