Docs Menu
Docs Home
/
MongoDB Atlas
/ /

Simulate Regional Outage

On this page

  • Required Access
  • Simulate Regional Outage Process
  • Simulate Regional Outage Using the Atlas UI
  • Simulate Regional Outage Using the API
  • Verify the Outage
  • Troubleshoot Outage

Note

This feature is not available for any of the following deployments:

  • Serverless instances

  • M0 clusters

  • M2/M5 clusters

  • Flex clusters

To learn more, see Limits.

You can use the Atlas UI and API to simulate an outage on your Atlas multi-region cluster and observe how your application handles an outage in one or more regions. You can also run multiple simulations. When running multiple simulations, we recommend a five minute interval between simulations.

To start an outage simulation, you must have Organization Owner or Project Owner access to the project.

When you submit a request to test an outage using the Atlas UI or API, Atlas simulates an outage event. During a simulated outage, Atlas:

If your application takes more than 15 minutes to notice connection loss to some nodes, we recommend that you reduce your TCP retransmission timeout values. To learn more, see modify tcp_retries2 value.

To simulate a Regional Outage in the Atlas UI:

1
  1. If it's not already displayed, select the organization that contains your desired project from the Organizations menu in the navigation bar.

  2. If it's not already displayed, select your desired project from the Projects menu in the navigation bar.

  3. If it's not already displayed, click Clusters in the sidebar.

    The Clusters page displays.

2
  1. For the cluster you wish to perform outage testing, click the ... button.

  2. Click Test Resilience.

  3. Select Regional Outage. Atlas displays a Test Resilience modal with the steps Atlas takes to simulate an outage event. To learn more, see Simulate Regional Outage Process.

3
  1. Click Select Regions.

  2. Select the tab corresponding to the type of outage you want to simulate:

    Select fewer than half of your electable nodes.

    Select at least one more than half of your electable nodes and keep at least one electable node remaining.

    After selecting a majority of your electable nodes, your replica set won't have a primary node. This means that your replica set can't perform write operations and read operations that are not configured with a suitable readPreference.

  3. Select Simulate Regional Outage to begin the test.

    Atlas notifies you when the outage occurs.

4

Select a tab corresponding to the type of outage you are performing:

When you finish testing the outage, click End Simulation.

When you finish testing the regional outage, you can perform one of the following:

You can use the Test Outage API endpoint to simulate an outage event. To learn more about the outage process, see Simulate Regional Outage Process.

To verify that the outage is successful, monitor your application and ensure your read and write operations are working as expected.

A regional outage or regional outage simulation that affects the highest priority regions in a sharded cluster could cause the cluster to become inoperable for read operations. To restore the config servers, do the following:

  • Configure a read preference that is suitable for querying secondary nodes for reads.

  • Reconfigure the cluster for regaining electable nodes.

Back

Test Primary Failover