Migrate from Azure CosmosDB to MongoDB Atlas Using Apache Kafka
Rate this tutorial
When you are the best of breed, you have many imitators. MongoDB is no different in the database world. If you are reading this blog, you are most likely an Azure customer that ended up using CosmosDB.
You needed a database that could handle unstructured data in Azure and eventually realized CosmosDB wasn’t the best fit. Perhaps you found that it is too expensive for your workload or not performing well or simply have no confidence in the platform. You also might have tried using the MongoDB API and found that the queries you wanted to use simply don’t work in CosmosDB because it fails 67% of the compatibility tests.
Whatever the path you took to CosmosDB, know that you can easily migrate your data to MongoDB Atlas while still leveraging the full power of Azure. With MongoDB Atlas in Azure, there are no more failed queries, slow performance, and surprise bills from not optimizing your RDUs. MongoDB Atlas in Azure also gives you access to the latest releases of MongoDB and the flexibility to leverage any of the three cloud providers if your business needs change.
Note: When you originally created your CosmosDB, you were presented with these API options:
If you created your CosmosDB using Azure Cosmos DB API for MongoDB, you can use mongo tools such as mongodump, mongorestore, mongoimport, and mongoexport to move your data. The Azure CosmosDB Connector for Kafka Connect does not work with CosmosDB databases that were created for the Azure Cosmos DB API for MongoDB.
In this blog post, we will cover how to leverage Apache Kafka to move data from Azure CosmosDB Core (Native API) to MongoDB Atlas. While there are many ways to move data, using Kafka will allow you to not only perform a one-time migration but to stream data from CosmosDB to MongoDB. This gives you the opportunity to test your application and compare the experience so that you can make the final application change to MongoDB Atlas when you are ready. The complete example code is available in this GitHub repository.
You’ll need access to an Apache Kafka cluster. There are many options available to you, including Confluent Cloud, or you can deploy your own Apache Kafka via Docker as shown in this blog. Microsoft Azure also includes an event messaging service called Azure Event Hubs. This service provides a Kafka endpoint that can be used as an alternative to running your own Kafka cluster. Azure Event Hubs exposes the same Kafka Connect API, enabling the use of the MongoDB connector and Azure CosmosDB DB Connector with the Event Hubs service.
If you do not have an existing Kafka deployment, perform these steps. You will need docker installed on your local machine:
1 git clone https://github.com/RWaltersMA/CosmosDB2MongoDB.git
Next, build the docker containers.
1 docker-compose up -d --build
The docker compose script (docker-compose.yml) will stand up all the components you need, including Apache Kafka and Kafka Connect. Install the CosmosDB and MongoDB connectors.
Modify the cosmosdb-source.json file and replace the placeholder values with your own.
1 { 2 "name": "cosmosdb-source", 3 "config": { 4 "connector.class": "com.azure.cosmos.kafka.connect.source.CosmosDBSourceConnector", 5 "tasks.max": "1", 6 "key.converter": "org.apache.kafka.connect.json.JsonConverter", 7 "value.converter": "org.apache.kafka.connect.json.JsonConverter", 8 "connect.cosmos.task.poll.interval": "100", 9 "connect.cosmos.connection.endpoint": 10 "https://**<cosmosinstance-name>**.documents.azure.com:443/", 11 "connect.cosmos.master.key": **"<cosmosdbprimarykey>",** 12 "connect.cosmos.databasename": **"<database name>",** 13 "connect.cosmos.containers.topicmap": **"<containers>#<topicname>”,** 14 "connect.cosmos.offset.useLatest": false, 15 "value.converter.schemas.enable": "false", 16 "key.converter.schemas.enable": "false" 17 } 18 }
Modify the mongo-sink.json file and replace the placeholder values with your own.
1 {"name": "mongo-sink", 2 "config": { 3 "connector.class":"com.mongodb.kafka.connect.MongoSinkConnector", 4 "tasks.max":"1", 5 "topics":"<topicname>", 6 "connection.uri":"<MongoDB Atlas Connection String>", 7 "database":"<Desired Database Name>", 8 "collection":"<Desired Collection Name>", 9 "key.converter": "org.apache.kafka.connect.json.JsonConverter", 10 "value.converter":"org.apache.kafka.connect.json.JsonConverter", 11 "value.converter.schemas.enable": "false", 12 "key.converter.schemas.enable": "false" 13 14 }}
Note: Before we configure Kafka Connect, make sure that your network settings on both CosmosDB and MongoDB Atlas will allow communication between these two services. In CosmosDB, select the Firewall and Virtual Networks. While the easiest configuration is to select “All networks,” you can provide a more secure connection by specifying the IP range from the Firewall setting in the Selected networks option. MongoDB Atlas Network access also needs to be configured to allow remote connections. By default, MongoDB Atlas does not allow any external connections. See Configure IP Access List for more information.
To configure our two connectors, make a REST API call to the Kafka Connect service:
1 curl -X POST -H "Content-Type: application/json" -d @cosmosdb-source.json http://localhost:8083/connectors 2 3 4 curl -X POST -H "Content-Type: application/json" -d @mongodb-sink.json http://localhost:8083/connectors
That’s it!
Provided the network and database access was configured properly, data from your CosmosDB should begin to flow into MongoDB Atlas. If you don’t see anything, here are some troubleshooting tips:
- Try connecting to your MongoDB Atlas cluster using the mongosh tool from the server running the docker container.
- View the docker logs for the Kafka Connect service.
- Verify that you can connect to the CosmosDB instance using the Azure CLI from the server running the docker container.
Summary
In this post, we explored how to move data from CosmosDB to MongoDB using Apache Kafka. If you’d like to explore this method and other ways to migrate data, check out the 2021 MongoDB partner of the year award winner, Peerslands', five-part blog post on CosmosDB migration.