MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Ensuring High Availability for MongoDB on Kubernetes

Mercy Bassey11 min read • Published Jul 12, 2024 • Updated Jul 12, 2024
MongoDB
SNIPPET
Facebook Icontwitter iconlinkedin icon
Graphic of Kubernetes and MongoDB
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
A database is a structured collection of data that allows efficient storage, retrieval, and manipulation of information. Databases are fundamental to modern applications and support critical functions like storing user accounts, transactions, and much more. Ensuring the high availability of a database is vital, as downtime can lead to significant disruptions, data loss, and financial impacts. High availability ensures that a database remains accessible and functional, even in the face of hardware failures, network issues, or other disruptions. In this tutorial, we will focus on achieving high availability with MongoDB.
MongoDB, when deployed on Kubernetes, can leverage the orchestration and automation capabilities of the platform to enhance its availability and resilience. With its robust features for container management, scaling, and recovery, Kubernetes provides an ideal environment for deploying highly available MongoDB instances. This tutorial will guide you through the steps necessary to deploy MongoDB on Kubernetes, configure it for high availability, set up backup mechanisms using mongodump, and implement automatic scaling to handle varying workloads.

Prerequisites

To follow along in this tutorial, make sure you have:

Deploying MongoDB

To begin, you’ll need to set up the necessary resources and configurations to deploy MongoDB in your Kubernetes cluster. You will create a persistent volume to ensure your MongoDB data is retained even if pods are rescheduled. You will also set up a headless service to enable stable network communication between MongoDB pods. Finally, you will deploy MongoDB using a StatefulSet, which will manage the deployment and scaling of MongoDB pods while maintaining their unique identities.
Note: If you wish to read more about the concepts mentioned, you can go through the documentation on Kubernetes.
Add the following configuration settings in a file called pv-pvc.yaml. This will create a persistent volume called mongodb-pv and a persistent volume claim called mongodb-pvc:
The PersistentVolume (PV) will provide the actual storage, while the PersistentVolumeClaim (PVC) will request the storage from the available PV.
Run the following commands to create the PersistentVolume and PersistentVolumeClaim:
Confirm that they are created with the following commands:
You should see output similar to the following, confirming that the PersistentVolume and PersistentVolumeClaim have been created successfully:
Screenshot representing that PersistentVolume and PersistentVolumeClaims have been created
Screenshot representing that PersistentVolume and PersistentVolumeClaims have been created
Next, create a headless service to manage stable network identities for the MongoDB pods. In a file called headless-service.yaml, add the following configuration; this will create a headless service for your MongoDB database called mongodb-service:
Apply the headless service to your Kubernetes cluster using the command:
Confirm that the headless service has been created using the following command:
The following output is expected:
Creating and viewing headless service
Creating and viewing headless service
Set up authentication for your MongoDB replica set using the following commands:
Setting up authentication for your MongoDB replica set using the mongodb-keyfile is crucial to secure communication between members, prevent unauthorized access, and ensure that only trusted nodes can join the replica set, thereby maintaining data integrity and security.
Create a Kubernetes secret to store the keyfile:
Finally, create a StatefulSet to deploy MongoDB with persistent storage and stable network identities. In a file called statefulset.yaml, add the following configuration:
Apply the StatefulSet to your Kubernetes cluster:
Confirm that the StatefulSet was created successfully with all pods healthy and running:
You should have the following output with the StatefulSet creating three MongoDB pods named mongodb-0, mongodb-1, and mongodb-2:
Creating and viewing Statefulsets and pods
Creating and viewing Statefulsets and pods

Configuring high availability

To ensure high availability for your MongoDB database, you will configure a replica set. A replica set in MongoDB is a group of mongod instances that maintain the same data set, providing redundancy and high availability.
Before configuring the replica set, it is helpful to have some data present. This step is optional, but having data in your database will help you understand the backup process more clearly in the following subsections.
First, exec into one of the MongoDB pods and initialize the replica set. Typically, you would use the first pod (e.g., mongodb-0).
You should see the following output:
When it comes to replica sets, insertion only occurs in the primary. Since we have set the first pod mongodb-0 to serve as the primary, we can make insertions. However, if you are not sure which pod is the primary, you can do so using the command:
Switch to a database called “users” and then insert some data using the following commands:
Note: You can use the find() command to list the users we just inserted. This command can be executed from any of the members of the replica set, including the secondary pods.
Once you have run the commands, you should see the following outputs indicating successful insertion of the documents into the femaleusers collection in the users database:
Viewing successful insertion of user documents into the femaleusers collection
Viewing successful insertion of user documents into the femaleusers collection

Performing a backup with mongodump

With some data already inserted into your MongoDB database, you are now ready to initiate the first step of high availability for your MongoDB database using mongodump.
First, create a separate PV and PVC for storing backups by adding the following configuration settings in a backup-pv-pvc.yaml file:
Apply this with the following command:
Confirm that they have been created with the following commands:
Confirming that backup-pv and backup-pvc have been created
Confirming that backup-pv and backup-pvc have been created
To back up your MongoDB data, you will create a Kubernetes CronJob that uses the mongodump utility. Using a CronJob here means you get to schedule when you’d like backups to occur automatically.
Create a file called mongodb-backup-cronjob.yaml with the following content:
Apply the configuration to create the CronJob and then verify that the CronJob has been created:
You should see output similar to the following, confirming that the CronJob has been created successfully:
Confirming that the CronJob has been created
Confirming that the CronJob has been created
After five minutes, you can confirm the status of the pods using the following command:
You should see a pod with a name similar to this which was created by the CronJob:
Viewing the pod created by the CronJob
Viewing the pod created by the CronJob
To verify that the backup was created successfully, check the logs of the backup pod:
The log output should be similar to this:
Viewing the logs of the pod created by the CronJob
Viewing the logs of the pod created by the CronJob
To access the backup files, you need to view the contents of the persistent volume where the backups are stored. Create a temporary pod to access these files by creating a file named backup-access.yaml with the following content:
This will create a temporary busybox pod that mounts the backup-pvc persistent volume claim (PVC) to the /backup directory within the container, allowing you to access and explore the backup files stored in the PV.
Apply the configuration and access the temporary pod:
Once inside the pod, navigate to the /backup directory to view the backup files:
You should see the following output:
Accessing users from backup
Accessing users from backup
Now, exit from the busybox container by typing in exit.

Performing failover and recovery testing

To ensure the high availability of your MongoDB replica set, you need to test failover and recovery processes. This section will guide you through simulating a failure and verifying that your MongoDB setup can handle it gracefully.
In a MongoDB replica set, one member is the primary node responsible for handling write operations, while the other members are secondary nodes that replicate data from the primary. If the primary node fails, the replica set will automatically elect a new primary from the remaining secondary nodes.
Note: To know more about replica sets in MongoDB, you can visit the documentation on replica sets in MongoDB.
Begin by identifying the current primary node:
Look for the member with the "stateStr" : "PRIMARY" attribute:
Simulate a failure by deleting the current primary node pod:
Monitor the status of the replica set and observe the election of a new primary:
You should see that one of the secondary nodes has been promoted to primary:
Verify that the deleted pod has been recreated and rejoined the replica set as a secondary node:
Confirm that the pod is back and running:
Check the replica set status again to ensure the new node is now re-elected as the primary since it had the highest priority:
You should see:

Configuring automatic scaling

In a dynamic environment, workload demands can fluctuate, necessitating the need for your MongoDB deployment to scale automatically to ensure consistent performance. Kubernetes provides a feature called Horizontal Pod Autoscaler (HPA) to manage this scaling.
The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on observed CPU utilization (or other select metrics). It helps ensure that your application has the right amount of resources to handle varying levels of traffic.
To configure automatic scaling for your MongoDB deployment, you need to first install Metrics Server using the following commands.
The HPA relies on the metrics server to collect metrics from the Kubernetes cluster.
You should see the following output if applied successfully:
Installing Metrics Server
Installing Metrics Server
Execute the command kubectl get pods -n kube-system to view the status of the metrics server pods. You should see the following:
At this point, the containers in the Metrics Server are not running due to TLS certificate issues. To resolve this, execute the following command to edit the metrics server deployment:
The kubectl edit deployment metrics-server -n kube-system command opens the deployment configuration in a vim editor by default. To edit the file, move your cursor to the appropriate section using the arrow keys. Type i to enter insert mode and make your changes. Once you have finished editing, press the Esc key to exit insert mode, then type :wq to save your changes and close the editor.
Add the following commands to the container spec to bypass TLS verification:
Adding commands to bypass TLS verification for Metrics Server
Adding commands to bypass TLS verification for Metrics Server
After making these changes, confirm that the containers are running by executing:
You should see the following output:
Check the resource usage of the pods in your cluster:
You see the following output:
Check the resource usage of the pods in your cluster
Check the resource usage of the pods in your cluster
Define an HPA resource that specifies how and when to scale the MongoDB StatefulSet using the following command:
This configuration will set the minimum number of replicas to 3 and allow scaling up to a maximum of 10 replicas based on CPU utilization.
You should see the following output:
Monitor the status of the HPA using the following command:
This will show you the current status of the HPA, including the current number of replicas and the metrics used for scaling.
Viewing HPA for the MongoDB StatefulSet
Viewing HPA for the MongoDB StatefulSet

Conclusion

High availability is crucial for the smooth operation of any database system. This tutorial has demonstrated how to deploy MongoDB on Kubernetes, configure it for high availability with a replica set, set up backup mechanisms using MongoDump, and configure automatic scaling using Kubernetes’s HPA. By following the steps highlighted in this tutorial, you can maintain continuous access to your data to prevent significant downtimes, which could lead to data loss and financial impacts. As a result, you'll have data durability and high availability at your fingertips.
If you have any questions or comments, feel free to join us in the MongoDB Developer Community.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Article

Why Use MongoDB with Ruby


Oct 07, 2022 | 4 min read
Tutorial

Type Safety With Prisma & MongoDB


Aug 09, 2024 | 4 min read
Tutorial

The Great Continuous Migration: CDC Jobs With Confluent Cloud and Relational Migrator


Jul 09, 2024 | 12 min read
Quickstart

Java - Change Streams


Jul 12, 2024 | 11 min read
Table of Contents
  • Prerequisites