Guidance for Atlas Backups
On this page
- Features for Atlas Backups
- Recommendations for Atlas Backups
- Recommendations for Backup Strategy
- Recommendations for Backup Policy
- Recommendations for Backup Distribution
- Recommendations for Backup Compliance Policy
- Recommendations for PIT Recovery
- Recommendations for Backup Costs
- Automation Examples: Atlas Backups
MongoDB Atlas provides fully managed and customizable backups to ensure data retention and recovery:
Cloud Backups: Taken using the native snapshot capabilities of your cloud provider, to support full-copy snapshots and localized snapshot storage. These snapshots are always incremental in nature and leverage the cloud provider's underlying backup snapshot mechanism for low cost and fast restores. You choose a backup policy that specifies a certain number of daily, weekly, and monthly snapshots.
Continuous Cloud Backups: This is an additive feature to cloud backups that provides Point In Time (PIT) recovery. This feature allows you to recover back to a specific minute during the restore process by backing up the oplog and capturing data changes between snapshots. This feature allows you to recover your data to the exact moment (a point in time) right before any failure or event, meeting Recovery Point Objectives (RPOs) as low as 1 minute.
We don't recommend enabling backup for development and test environments. For staging and production environments, we recommend developing automated deployment templates that include the recommendations described in this page.
Features for Atlas Backups
Atlas provides fully-managed backups of your data, including point-in-time data recovery and consistent, cluster-wide snapshots of all clusters, including sharded clusters. In Atlas, you can choose from four snapshot frequencies: hourly, daily, weekly, and monthly, each with its own retention period.
Cloud Backups | This feature provides localized backup storage using the native snapshot functionality of your cluster's cloud service provider. Benefits include a strong default backup retention schedule of 12 months, full flexibility to customize snapshot and retention schedules, and the ability to set different snapshot frequencies (such as hourly for recovery, weekly or monthly for long-term retention) to meet industry regulations. You can access your backup data instantly, which is useful for auditing, compliance, or data recovery purposes and also run queries directly against the backup data, saving time and resources. |
Continuous Cloud Backups | This feature provides Point In Time (PIT) recovery, which allows you to recover back to any timestamp. This allows you to recover your data to the exact moment (a point in time) right before any failure or event, like a cyber attack. You can also set a customized restore window to dictate how many days you would like to be able to restore back to a specific point in time. |
Multi-region Snapshot Distribution | This feature allows you to increase resilience by automatically distributing backup snapshots and oplogs across geographic regions instead of just storing them in their primary region. You can meet compliance requirements of storing backups in different, air gapped geographical locations to ensure disaster recovery in case of regional outages. To learn more, see Snapshot Distribution. |
Backup Compliance Policy | This feature enables you to further secure business critical data by preventing all snapshots and oplogs stored in Atlas from being modified or deleted for a predefined retention period specified by you, guaranteeing that your backups are fully WORM (Write Once Read Many) compliant. Only a designated, authorized user can turn off this protection after completing a verification process with MongoDB support. This feature adds a mandatory manual delay and cooldown period so that an attacker cannot change the backup policy and export the data. To learn more, see Configure a Backup Compliance Policy. |
Recommendations for Atlas Backups
Recommendations for Backup Strategy
You must align your backup strategy with specific Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) to meet business continuity requirements, particularly for critical applications where near-instant RPO and rapid recovery times are crucial. RPO defines the maximum acceptable amount of data loss during an incident, while RTO defines how quickly your application must recover. Since data varies in importance, you must evaluate RPO and RTO for each application individually. For example, any mission-critical data will likely have different requirements than clickstream analytics. Your requirements for RTO, RPO, and the backup retention period will influence the cost and performance considerations of maintaining backups. In development and test environments, we recommend that you disable backup to save costs. In staging and production environments, ensure that backup is enabled in your deployment template and that you have successfully tested your backup and restore procedure and process.
Large replica sets (and shards) take longer to restore from backup. In staging and production environments, through testing techniques, we recommend that you identify replica set size or shard size limits to ensure that your size is compatible with RTO requirements. Ensure that snapshot schedule and retention policies meet any RPO requirements.
In production, in addition to Atlas cloud backups, we recommend that you start with a default of continuous cloud backups with a restore window of seven days. Adjust this time range with a longer setting based on the criticality of the workload. This allows you to replay the oplog to restore a cluster from a particular point in time and satisfy your RTO.
See also:
Recommendations for Backup Policy
Atlas provides predefined backup snapshot schedules including frequency of snapshots, and retention period. Retaining backup snapshots for long periods can be costly. We recommend building automated deployment templates that meet your requirements based on the size and criticality of the data and the environment (development, test, staging, production). For frequency and retention of snapshots, we recommend the following:
Tier | RTO | RPO | Recommended Frequency and Retention | Total Number of Snapshots |
---|---|---|---|---|
Tier 1 | 30 minutes | Near zero (within 7 days) | Hourly: Every 12 hours, retain for 7 days = 14 snapshots Daily: Once a day, retain for 7 days = 7 snapshots Weekly: Saturday, retain for 4 weeks = 4 snapshots Monthly: Last day of month, retain for 3 months = 6 snapshots | 31 |
Tier 2 | 12 hours | Near zero (within 7 days) | Daily: Once a day, retain for 7 days = 7 snapshots Weekly: Saturday, retain for 4 weeks = 4 snapshots Monthly: Last day of month, retain for 3 months = 3 snapshots | 14 |
Tier 3 | 3 days | Near zero (within 2 days) | Daily: Once a day, retain for 7 days = 7 snapshots Weekly: Saturday, retain for 4 weeks = 4 snapshots Monthly: Last day of month, retain for 3 months = 3 snapshots | 14 |
Recommendations for Backup Distribution
Atlas provides options for backup locations. To further enhance resilience, we recommend distributing backups to a local region and to an external disaster recovery region, ensuring data recovery even during regional outages. For an Atlas cluster in three regions, multi-region Snapshot Distribution copies backups to two secondary regions, enabling restores by using backup copies. You can also copy critical backups, with the point-in-time data, to any secondary region available from your cloud provider in Atlas .
When you configure your snapshot frequency, retention, and distribution, we recommend striking a balance between availability and cost. However, your critical workloads might require multiple copies of snapshots in various locations.
Recommendations for Backup Compliance Policy
We recommend enforcing Atlas's Backup Compliance Policy to prevent unauthorized modifications or deletions of backups, thereby maintaining data integrity and supporting robust disaster recovery.
Recommendations for PIT Recovery
Continuous Cloud Backups enable precise Point In Time (PIT) recovery, which minimizes data loss during failures. Atlas can quickly recover to the exact timestamp before a failure event, giving you at least a one minute RPO and an RTO of less than 15 minutes when utilizing optimized restores, even in the event of the outage of the primary region. This is because Atlas restores the most recent snapshot from before the desired point in time and then replays the oplog changes to restore to that particular point. Recovery times can vary due to cloud provider disk warming and how much of the oplog must replay during recovery. Your cluster performance might be slow until the cloud provider disk warming completes after a restore. If you can be flexible in your requirements for recovery, we recommend designing templates that identify the best compromise between reasonable recovery options and cost.
Recommendations for Backup Costs
To optimize Atlas backup costs, you must adjust the backup frequency and retention policies to align with data criticality, reducing unnecessary storage expenses. For example, you should disable backups in lower environments, and ensure that in upper environments with high availability requirements you are distributing backups to each region where your Atlas clusters are deployed. You can also use incremental backups through snapshots that capture only incremental changes and built-in compression to minimize the amount of stored data. By selecting regions strategically for backup, you can avoid cross-region data transfer fees and choose the right cluster disk size based on workload to prevent overspending. By implementing these strategies, you can effectively manage costs while maintaining secure and reliable backups.
Automation Examples: Atlas Backups
See Terraform examples to enforce our Staging/Prod recommendations across all pillars in one place in Github.
The following examples enable backup and restore operations using Atlas tools for automation.
These examples apply only for staging and production environments where backup is enabled for the cluster.
Run the following command take a backup snapshot for the cluster named myDemo and retain the snapshot for 7 days:
atlas backups snapshots create myDemo --desc "my backup snapshot" --retention 7
Enable backup compliance policy for your project with a
designated, authorized user (governance@example.org
) who alone can turn
off this protection after completing a verification process with
MongoDB support.
atlas backups compliancePolicy enable \ --projectId 67212db237c5766221eb6ad9 \ --authorizedEmail governance@example.org \ --authorizedUserFirstName john \ --authorizedUserLastName doe
Run the following command to create a compliance policy for
scheduled backup snapshots that enforces the number of times
snapshots must be taken, which is set to every 6
hours, and
the duration for retaining the snapshots, which is set to 1
month.
atlas backups compliancePolicy policies scheduled create \ --projectId 67212db237c5766221eb6ad9 \ --frequencyInterval 6 \ --frequencyType hourly \ --retentionValue 1 \ --retentionUnit months
The following examples demonstrate how to configure backups during deployment. Before you can create resources with Terraform, you must:
Create your paying organization and create an API key for the paying organization. Store your API key as environment variables by running the following command in the terminal:
export MONGODB_ATLAS_PUBLIC_KEY="<insert your public key here>" export MONGODB_ATLAS_PRIVATE_KEY="<insert your private key here>"
Common Files
You must create the following files for each example. Place the files for each example in their own directory. Change the IDs and names to use your values. Then run the commands to initialize Terraform, view the Terraform plan, and apply the changes.
variables.tf
variable "org_id" { description = "Atlas organization ID" type = string } variable "project_name" { description = "Atlas project name" type = string } variable "cluster_name" { description = "Atlas Cluster Name" type = string } variable "point_in_time_utc_seconds" { description = "PIT in UTC" default = 0 type = number }
Configure Backup Schedule for the Cluster
Use the following to configure a Tier 1 backup schedule for the cluster.
main.tf
locals { atlas_clusters = { "cluster_1" = { name = "m10-aws-1e", region = "US_EAST_1" }, "cluster_2" = { name = "m10-aws-2e", region = "US_EAST_2" }, } } resource "mongodbatlas_project" "atlas-project" { org_id = var.org_id name = var.project_name } resource "mongodbatlas_advanced_cluster" "automated_backup_test_cluster" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id name = each.value.name cluster_type = "REPLICASET" replication_specs { region_configs { electable_specs { instance_size = "M10" node_count = 3 } analytics_specs { instance_size = "M10" node_count = 1 } provider_name = "AWS" region_name = each.value.region priority = 7 } } backup_enabled = true # enable cloud backup snapshots pit_enabled = true } resource "mongodbatlas_cloud_backup_schedule" "test" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id cluster_name = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].name reference_hour_of_day = 3 # backup start hour in UTC reference_minute_of_hour = 45 # backup start minute in UTC restore_window_days = 7 # Restore window for near-zero RPO copy_settings { cloud_provider = "AWS" frequencies = ["HOURLY", "DAILY", "WEEKLY", "MONTHLY", "YEARLY", "ON_DEMAND"] region_name = "US_WEST_1" zone_id = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].replication_specs.*.zone_id[0] should_copy_oplogs = true } policy_item_hourly { frequency_interval = 12 # backup every 12 hours, accepted values = 1, 2, 4, 6, 8, 12 -> every n hours retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_daily { frequency_interval = 1 # backup every day, accepted values = 1 -> every 1 day retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_weekly { frequency_interval = 7 # every Sunday, accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week retention_unit = "weeks" retention_value = 4 # retain for 4 weeks } policy_item_monthly { frequency_interval = 28 # accepted values = 1 to 28 -> 1 to 28 every nth day of the month retention_unit = "months" retention_value = 3 # retain for 3 months } depends_on = [ mongodbatlas_advanced_cluster.automated_backup_test_cluster ] }
Use the following to configure a Tier 2 backup schedule for the cluster.
main.tf
locals { atlas_clusters = { "cluster_1" = { name = "m10-aws-1e", region = "US_EAST_1" }, "cluster_2" = { name = "m10-aws-2e", region = "US_EAST_2" }, } } resource "mongodbatlas_project" "atlas-project" { org_id = var.org_id name = var.project_name } resource "mongodbatlas_advanced_cluster" "automated_backup_test_cluster" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id name = each.value.name cluster_type = "REPLICASET" replication_specs { region_configs { electable_specs { instance_size = "M10" node_count = 3 } analytics_specs { instance_size = "M10" node_count = 1 } provider_name = "AWS" region_name = each.value.region priority = 7 } } backup_enabled = true # enable cloud backup snapshots pit_enabled = true } resource "mongodbatlas_cloud_backup_schedule" "test" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id cluster_name = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].name reference_hour_of_day = 3 # backup start hour in UTC reference_minute_of_hour = 45 # backup start minute in UTC restore_window_days = 7 # Restore window for near-zero RPO copy_settings { cloud_provider = "AWS" frequencies = ["HOURLY", "DAILY", "WEEKLY", "MONTHLY", "YEARLY", "ON_DEMAND"] region_name = "US_WEST_1" zone_id = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].replication_specs.*.zone_id[0] should_copy_oplogs = true } policy_item_daily { frequency_interval = 1 # backup every day, accepted values = 1 -> every 1 day retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_weekly { frequency_interval = 7 # every Sunday, accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week retention_unit = "weeks" retention_value = 4 # retain for 4 weeks } policy_item_monthly { frequency_interval = 28 # accepted values = 1 to 28 -> 1 to 28 every nth day of the month # accepted values = 40 -> every last day of the month retention_unit = "months" retention_value = 3 # retain for 3 months } depends_on = [ mongodbatlas_advanced_cluster.automated_backup_test_cluster ] }
Use the following to configure a Tier 3 backup schedule for the cluster.
main.tf
locals { atlas_clusters = { "cluster_1" = { name = "m10-aws-1e", region = "US_EAST_1" }, "cluster_2" = { name = "m10-aws-2e", region = "US_EAST_2" }, } } resource "mongodbatlas_project" "atlas-project" { org_id = var.org_id name = var.project_name } resource "mongodbatlas_advanced_cluster" "automated_backup_test_cluster" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id name = each.value.name cluster_type = "REPLICASET" replication_specs { region_configs { electable_specs { instance_size = "M10" node_count = 3 } analytics_specs { instance_size = "M10" node_count = 1 } provider_name = "AWS" region_name = each.value.region priority = 7 } } backup_enabled = true # enable cloud backup snapshots pit_enabled = true } resource "mongodbatlas_cloud_backup_schedule" "test" { for_each = local.atlas_clusters project_id = mongodbatlas_project.atlas-project.id cluster_name = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].name reference_hour_of_day = 3 # backup start hour in UTC reference_minute_of_hour = 45 # backup start minute in UTC restore_window_days = 7 # Restore window for near-zero RPO copy_settings { cloud_provider = "AWS" frequencies = ["HOURLY", "DAILY", "WEEKLY", "MONTHLY", "YEARLY", "ON_DEMAND"] region_name = "US_WEST_1" zone_id = mongodbatlas_advanced_cluster.automated_backup_test_cluster[each.key].replication_specs.*.zone_id[0] should_copy_oplogs = true } policy_item_daily { frequency_interval = 1 # backup every day, accepted values = 1 -> every 1 day retention_unit = "days" retention_value = 7 # retain for 7 days } policy_item_weekly { frequency_interval = 7 # every Sunday, accepted values = 1 to 7 -> every 1=Monday,2=Tuesday,3=Wednesday,4=Thursday,5=Friday,6=Saturday,7=Sunday day of the week retention_unit = "weeks" retention_value = 4 # retain for 4 weeks } policy_item_monthly { frequency_interval = 28 # accepted values = 1 to 28 -> 1 to 28 every nth day of the month # accepted values = 40 -> every last day of the month retention_unit = "months" retention_value = 3 # retain for 3 months } depends_on = [ mongodbatlas_advanced_cluster.automated_backup_test_cluster ] }
Configure Backup and PIT Restore for the Cluster
Use the following to configure cloud backup snapshot and PIT restore job.
main.tf
Create a project resource "mongodbatlas_project" "project_test" { name = var.project_name org_id = var.org_id } Create a cluster with 3 nodes resource "mongodbatlas_advanced_cluster" "cluster_test" { project_id = mongodbatlas_project.project_test.id name = var.cluster_name cluster_type = "REPLICASET" backup_enabled = true # enable cloud provider snapshots pit_enabled = true retain_backups_enabled = true # keep the backup snapshopts once the cluster is deleted replication_specs { region_configs { priority = 7 provider_name = "AWS" region_name = "US_EAST_1" electable_specs { instance_size = "M10" node_count = 3 } } } } Specify number of days to retain backup snapshots resource "mongodbatlas_cloud_backup_snapshot" "test" { project_id = mongodbatlas_advanced_cluster.cluster_test.project_id cluster_name = mongodbatlas_advanced_cluster.cluster_test.name description = "My description" retention_in_days = "1" } Specify the snapshot ID to use to restore resource "mongodbatlas_cloud_backup_snapshot_restore_job" "test" { count = (var.point_in_time_utc_seconds == 0 ? 0 : 1) project_id = mongodbatlas_cloud_backup_snapshot.test.project_id cluster_name = mongodbatlas_cloud_backup_snapshot.test.cluster_name snapshot_id = mongodbatlas_cloud_backup_snapshot.test.id delivery_type_config { point_in_time = true target_cluster_name = mongodbatlas_advanced_cluster.cluster_test.name target_project_id = mongodbatlas_advanced_cluster.cluster_test.project_id point_in_time_utc_seconds = var.point_in_time_utc_seconds } }