A database is a structured collection of data that allows efficient storage, retrieval, and manipulation of information. Databases are fundamental to modern applications and support critical functions like storing user accounts, transactions, and much more. Ensuring the high availability of a database is vital, as downtime can lead to significant disruptions, data loss, and financial impacts. High availability ensures that a database remains accessible and functional, even in the face of hardware failures, network issues, or other disruptions. In this tutorial, we will focus on achieving high availability with MongoDB.
MongoDB, when deployed on Kubernetes, can leverage the orchestration and automation capabilities of the platform to enhance its availability and resilience. With its robust features for container management, scaling, and recovery, Kubernetes provides an ideal environment for deploying highly available MongoDB instances. This tutorial will guide you through the steps necessary to deploy MongoDB on Kubernetes, configure it for high availability, set up backup mechanisms using mongodump, and implement automatic scaling to handle varying workloads.
Prerequisites
To follow along in this tutorial, make sure you have:
Kubectl installed and configured on your machine to interact with the Kubernetes cluster.
Deploying MongoDB
To begin, you’ll need to set up the necessary resources and configurations to deploy MongoDB in your Kubernetes cluster. You will create a persistent volume to ensure your MongoDB data is retained even if pods are rescheduled. You will also set up a headless service to enable stable network communication between MongoDB pods. Finally, you will deploy MongoDB using a StatefulSet, which will manage the deployment and scaling of MongoDB pods while maintaining their unique identities.
Note: If you wish to read more about the concepts mentioned, you can go through the documentation on Kubernetes.
Add the following configuration settings in a file called pv-pvc.yaml. This will create a persistent volume called mongodb-pv and a persistent volume claim called mongodb-pvc:
Run the following commands to create the PersistentVolume and PersistentVolumeClaim:
1
kubectl apply -f pv-pvc.yaml
Confirm that they are created with the following commands:
1
kubectl get pv
2
kubectl get pvc
You should see output similar to the following, confirming that the PersistentVolume and PersistentVolumeClaim have been created successfully:
Screenshot representing that PersistentVolume and PersistentVolumeClaims have been created
Next, create a headless service to manage stable network identities for the MongoDB pods. In a file called headless-service.yaml, add the following configuration; this will create a headless service for your MongoDB database called mongodb-service:
1
apiVersion: v1
2
kind: Service
3
metadata:
4
name: mongodb-service
5
spec:
6
clusterIP: None
7
selector:
8
app: mongodb
9
ports:
10
- port: 27017
Apply the headless service to your Kubernetes cluster using the command:
1
kubectl apply -f headless-service.yaml
Confirm that the headless service has been created using the following command:
1
kubectl get svc
The following output is expected:
Creating and viewing headless service
Set up authentication for your MongoDB replica set using the following commands:
Setting up authentication for your MongoDB replica set using the mongodb-keyfile is crucial to secure communication between members, prevent unauthorized access, and ensure that only trusted nodes can join the replica set, thereby maintaining data integrity and security.
Finally, create a StatefulSet to deploy MongoDB with persistent storage and stable network identities. In a file called statefulset.yaml, add the following configuration:
1
apiVersion: apps/v1
2
kind: StatefulSet
3
metadata:
4
name: mongodb # Specifies the name of the statefulset
5
spec:
6
serviceName: "mongodb-service" # Specifies the service to use
7
replicas: 3
8
selector:
9
matchLabels:
10
app: mongodb
11
template:
12
metadata:
13
labels:
14
app: mongodb
15
spec:
16
containers:
17
- name: mongodb
18
image: mongo:latest
19
command:
20
- mongod
21
- "--replSet"
22
- rs0
23
- "--bind_ip_all"
24
ports:
25
- containerPort: 27017
26
volumeMounts:
27
- name: mongodb-storage
28
mountPath: /data/db
29
- name: keyfile
30
mountPath: /etc/mongodb-keyfile
31
readOnly: true
32
resources:
33
requests:
34
cpu: "100m"
35
memory: "256Mi"
36
limits:
37
cpu: "500m"
38
memory: "512Mi"
39
volumes:
40
- name: keyfile
41
secret:
42
secretName: mongodb-keyfile
43
defaultMode: 0400
44
volumeClaimTemplates:
45
- metadata:
46
name: mongodb-storage
47
spec:
48
accessModes: ["ReadWriteOnce"]
49
resources:
50
requests:
51
storage: 5Gi
Apply the StatefulSet to your Kubernetes cluster:
1
kubectl apply -f statefulset.yaml
Confirm that the StatefulSet was created successfully with all pods healthy and running:
1
kubectl get statefulsets
2
kubectl get pods
You should have the following output with the StatefulSet creating three MongoDB pods named mongodb-0, mongodb-1, and mongodb-2:
Creating and viewing Statefulsets and pods
Configuring high availability
To ensure high availability for your MongoDB database, you will configure a replica set. A replica set in MongoDB is a group of mongod instances that maintain the same data set, providing redundancy and high availability.
Before configuring the replica set, it is helpful to have some data present. This step is optional, but having data in your database will help you understand the backup process more clearly in the following subsections.
First, exec into one of the MongoDB pods and initialize the replica set. Typically, you would use the first pod (e.g., mongodb-0).
When it comes to replica sets, insertion only occurs in the primary. Since we have set the first pod mongodb-0 to serve as the primary, we can make insertions. However, if you are not sure which pod is the primary, you can do so using the command:
1
rs.status()
Switch to a database called “users” and then insert some data using the following commands:
Note: You can use the find() command to list the users we just inserted. This command can be executed from any of the members of the replica set, including the secondary pods.
Once you have run the commands, you should see the following outputs indicating successful insertion of the documents into the femaleusers collection in the users database:
Viewing successful insertion of user documents into the femaleusers collection
Performing a backup with mongodump
With some data already inserted into your MongoDB database, you are now ready to initiate the first step of high availability for your MongoDB database using mongodump.
First, create a separate PV and PVC for storing backups by adding the following configuration settings in a backup-pv-pvc.yaml file:
1
apiVersion: v1
2
kind: PersistentVolume
3
metadata:
4
name: backup-pv
5
spec:
6
capacity:
7
storage: 5Gi
8
accessModes:
9
- ReadWriteOnce
10
hostPath:
11
path: /mnt/backup
12
---
13
apiVersion: v1
14
kind: PersistentVolumeClaim
15
metadata:
16
name: backup-pvc
17
spec:
18
accessModes:
19
- ReadWriteOnce
20
resources:
21
requests:
22
storage: 5Gi
Apply this with the following command:
1
kubectl apply -f backup-pv-pvc.yaml
Confirm that they have been created with the following commands:
1
kubectl get pv
2
kubectl get pvc
Confirming that backup-pv and backup-pvc have been created
To back up your MongoDB data, you will create a Kubernetes CronJob that uses the mongodump utility. Using a CronJob here means you get to schedule when you’d like backups to occur automatically.
Create a file called mongodb-backup-cronjob.yaml with the following content:
1
apiVersion: batch/v1
2
kind: CronJob
3
metadata:
4
name: mongodb-backup
5
spec:
6
schedule: "*/5 * * * *" # Runs backup every five minutes
Apply the configuration to create the CronJob and then verify that the CronJob has been created:
1
kubectl apply -f mongodb-backup-cronjob.yaml
2
kubectl get cronjob
You should see output similar to the following, confirming that the CronJob has been created successfully:
Confirming that the CronJob has been created
After five minutes, you can confirm the status of the pods using the following command:
1
kubectl get pods
You should see a pod with a name similar to this which was created by the CronJob:
Viewing the pod created by the CronJob
To verify that the backup was created successfully, check the logs of the backup pod:
1
kubectl logs mongodb-backup-28635290-jfsjn
The log output should be similar to this:
Viewing the logs of the pod created by the CronJob
To access the backup files, you need to view the contents of the persistent volume where the backups are stored. Create a temporary pod to access these files by creating a file named backup-access.yaml with the following content:
This will create a temporary busybox pod that mounts the backup-pvc persistent volume claim (PVC) to the /backup directory within the container, allowing you to access and explore the backup files stored in the PV.
1
apiVersion: v1
2
kind: Pod
3
metadata:
4
name: backup-access
5
spec:
6
containers:
7
- name: busybox
8
image: busybox
9
command: ["sh", "-c", "sleep 3600"]
10
volumeMounts:
11
- name: backup-storage
12
mountPath: /backup
13
volumes:
14
- name: backup-storage
15
persistentVolumeClaim:
16
claimName: backup-pvc
Apply the configuration and access the temporary pod:
1
kubectl apply -f backup-access.yaml
2
kubectl exec -it backup-access -- sh
Once inside the pod, navigate to the /backup directory to view the backup files:
1
cd /backup
2
ls
3
cd <file>
4
ls
5
cdusers
You should see the following output:
Accessing users from backup
Now, exit from the busybox container by typing in exit.
Performing failover and recovery testing
To ensure the high availability of your MongoDB replica set, you need to test failover and recovery processes. This section will guide you through simulating a failure and verifying that your MongoDB setup can handle it gracefully.
In a MongoDB replica set, one member is the primary node responsible for handling write operations, while the other members are secondary nodes that replicate data from the primary. If the primary node fails, the replica set will automatically elect a new primary from the remaining secondary nodes.
Note: To know more about replica sets in MongoDB, you can visit the documentation on replica sets in MongoDB.
In a dynamic environment, workload demands can fluctuate, necessitating the need for your MongoDB deployment to scale automatically to ensure consistent performance. Kubernetes provides a feature called Horizontal Pod Autoscaler (HPA) to manage this scaling.
The Kubernetes Horizontal Pod Autoscaler automatically scales the number of pods in a deployment, replication controller, or replica set based on observed CPU utilization (or other select metrics). It helps ensure that your application has the right amount of resources to handle varying levels of traffic.
To configure automatic scaling for your MongoDB deployment, you need to first install Metrics Server using the following commands.
The HPA relies on the metrics server to collect metrics from the Kubernetes cluster.
You should see the following output if applied successfully:
Installing Metrics Server
Execute the command kubectl get pods -n kube-system to view the status of the metrics server pods. You should see the following:
1
~$ kubectl get pods -n kube-system
2
NAME READY STATUS RESTARTS AGE
3
...
4
metrics-server-6d94bc8694-mkdrb 0/1 Running 0 60s
At this point, the containers in the Metrics Server are not running due to TLS certificate issues. To resolve this, execute the following command to edit the metrics server deployment:
The kubectl edit deployment metrics-server -n kube-system command opens the deployment configuration in a vim editor by default. To edit the file, move your cursor to the appropriate section using the arrow keys. Type i to enter insert mode and make your changes. Once you have finished editing, press the Esc key to exit insert mode, then type :wq to save your changes and close the editor.
Add the following commands to the container spec to bypass TLS verification:
1
spec:
2
containers:
3
- args:
4
...
5
command:
6
- /metrics-server
7
- --kubelet-insecure-tls
8
- --kubelet-preferred-address-types=InternalIP
Adding commands to bypass TLS verification for Metrics Server
After making these changes, confirm that the containers are running by executing:
1
kubectl get pods -n kube-system
You should see the following output:
1
~$ kubectl get pods -n kube-system
2
NAME READY STATUS RESTARTS AGEcalico-kube-controllers-67dfb8c8c9-spmcs 1/1 Running 0 16m
Monitor the status of the HPA using the following command:
1
kubectl get hpa
This will show you the current status of the HPA, including the current number of replicas and the metrics used for scaling.
Viewing HPA for the MongoDB StatefulSet
Conclusion
High availability is crucial for the smooth operation of any database system. This tutorial has demonstrated how to deploy MongoDB on Kubernetes, configure it for high availability with a replica set, set up backup mechanisms using MongoDump, and configure automatic scaling using Kubernetes’s HPA. By following the steps highlighted in this tutorial, you can maintain continuous access to your data to prevent significant downtimes, which could lead to data loss and financial impacts. As a result, you'll have data durability and high availability at your fingertips.
If you have any questions or comments, feel free to join us in the MongoDB Developer Community.