BlogRun AI wherever your compliance framework demands. Read blog >

BlogRetrieval accuracy is now a competitive advantage Read blog >

The Basics of Autoscaling: What Is Autoscaling?

Autoscaling is the automatic provisioning and deprovisioning of resources, like compute and storage, to match application demand. This process ensures cost optimization, high resource utilization, and consistent application performance.

Key takeaways

Horizontal scaling adds more resources (nodes and instances), whereas vertical scaling adds capacity to the existing resources (CPU, memory, and storage).
Predictive scaling can be horizontal or vertical, where the timing is decided based on previous usage patterns and trends.
Autoscaling reduces manual intervention, improves elasticity, reduces cost, and optimizes performance.
Vertical, horizontal, and elastic scaling options are built into MongoDB Atlas.
MongoDB provides horizontal scaling through sharding, where data is distributed across shards.

Table of contents

Why autoscaling matters
Autoscaling with MongoDB Atlas
Autoscaling basics
Types of autoscaling
Autoscaling vs. load balancing
Cloud computing and autoscaling
How autoscaling works
Configuring autoscaling for MongoDB workloads
Conclusion
Related resources
FAQs

Why autoscaling matters

Scaling is essential as your business grows, giving you additional CPU, RAM, storage, or disk throughput. Traditionally, there are two primary manual scaling approaches:

Vertical scaling: Upgrading to more powerful machines (adding "muscle" to a single server).
Horizontal scaling: Adding more instances to a distributed system to share the load (adding "team members" to a cluster).

As applications grow, manual monitoring becomes impractical. Dynamic scaling addresses this by automatically increasing resources during peak windows—such as a flash sale—and scaling back during low-traffic periods to reduce overhead and save costs.

Autoscaling with MongoDB Atlas

MongoDB Atlas supports autoscaling of database resources by automatically adjusting cluster tier (CPU/RAM) and storage capacity within defined limits. This means MongoDB Atlas can scale up when workloads increase and scale down when demand drops—without manual reconfiguration. MongoDB Atlas also supports horizontal scaling through sharding.

Autoscaling basics

Let’s start with an example: A retail website with an average of 2,000 customers per day runs a flash sale for two hours from 9 p.m. to 11 p.m. During the sale, the number of visitors increases to 15,000, which is about seven times the usual traffic.

This experience shows you cannot rely on manual scaling every time. A better approach is to have a system that continuously monitors the application workload, automatically adjusting the resources and ensuring the application functions smoothly—with the end result being a seamless user experience.

Types of autoscaling

There are several ways to autoscale a system—horizontal, vertical, predictive, and scheduled. Learn more about each type below.

Horizontal autoscaling

Horizontal autoscaling adds more machines (instances or nodes) as load increases. The system detects the need based on metrics like CPU utilization, memory usage, and number of requests, then allocates the resources accordingly. It requires load balancing to distribute traffic efficiently, diverting traffic across multiple instances of a cluster, rather than one server handling all the requests.

Horizontal autoscaling is good for distributed systems and helps achieve high availability and fault tolerance when implemented with proper architecture and design. It’s extensively used for web applications, cloud-native workloads, streaming applications, real-time analytics, containerized applications, and microservices.

Vertical autoscaling

In vertical autoscaling, the capacity of existing resources, like CPU, RAM, storage, disk (I/O) throughput, is increased as the demand grows. Instead of adding more instances, the existing server capacity is upgraded.

For example, moving from a dual-core processor to an 8-core processor, or expanding memory from 4GB to 16GB on a server. Vertical scaling is apt for workloads that cannot be distributed, like monolithic applications. It’s simple to implement and avoids the complexities of clustering. However, upgrades are limited by the hardware capacity of the machine, and there may be downtime to restart the system after configuration changes, depending on the environment.

Predictive autoscaling

Traditional horizontal and vertical autoscaling methods are reactive, triggered once changes in the workload are detected based on system metrics. Conversely, predictive autoscaling is proactive, automatically adjusting to variable traffic patterns based on historical usage data and predictive algorithms. The system forecasts demand and scales resources ahead of expected traffic spikes, helping minimize latency and ensuring a smooth customer experience.

For example, the system can detect when sales are usually highest for a website, say between 8 p.m. and 10 p.m. every day; it can proactively scale resources during that time window based on predicted demand rather than waiting for the load to increase. This approach improves performance and can be more cost-effective. However, the effectiveness of predictive autoscaling depends on the accuracy and consistency of historical data and the prediction model.

Scheduled autoscaling

Scheduled autoscaling triggers actions at specific, predetermined times of the day. This is ideal for known events, such as a batch job at midnight or a flash sale from 12 p.m. to 3 p.m. The system can schedule and trigger scaling actions at 11:40 a.m., ensuring the system is ready before the demand escalates at noon.

Autoscaling vs. load balancing

Load balancing and autoscaling complement each other for high concurrent user applications:

Load balancers distribute user requests across application instances for efficient utilization of each running instance. For example, MongoDB load balancing distributes traffic across running instances and containers.
Autoscaling automatically adjusts the number of instances based on resource demand.

Think of a load balancer as a “traffic cop,” distributing incoming requests across your servers, and autoscaling as a “fleet manager,” adding or removing the servers that the traffic cop directs.

Cloud computing and autoscaling

Autoscaling is a key cloud computing feature that automatically adjusts resources such as servers, storage, databases, and networking based on demand. Cloud infrastructure benefits organizations by eliminating the need to maintain costly hardware, enabling significant cost savings.

Each cloud provider has different strengths:

Amazon Web Services offer an autoscaling group mechanism, which maintains the desired number of instances by automatically adding or removing instances and replacing unhealthy instances.
Microsoft Azure supports scheduled scaling, where scaling actions are executed at specific times defined by rules, and also provides metric based autoscaling for dynamic workloads.
Google Cloud Platform offers predictive autoscaling with managed instance groups, making it easy to configure for containerized workloads.

How autoscaling works

Let’s say a new ecommerce company started digital operations and expected about 100 hits per day. The initial launch configuration used a single virtual machine with a specific capacity—2 GB RAM and 1 vCPU. After launch, they decided to scale up their digital operations and advertise a sale, which was expected to increase traffic to 2,000 during the peak sales period.

To be prepared, the company needed to do capacity planning that addressed which servers or instances needed to be scaled up, by how much, and which resources—CPU, RAM, or disk throughput—needed to be expanded.

Factor	Consideration	Reason
Capacity planning	Expected traffic, peak load time	Define scaling limits to avoid over- or under-provisioning
Scaling metrics	CPU, memory, requests per second (rate), latency	Ensures scaling is based on actual load
Workload characteristics	Read-heavy vs. write-heavy	For write-heavy workloads, sharding, higher IOPS, for read-heavy, replica set, or caching
Scaling strategy	Horizontal vs. vertical	Horizontal is more flexible for distributed systems, while vertical is easier and quicker to implement
Minimum and maximum threshold	Threshold, minimum, and maximum number of resources or instances	Identifies the best time for scaling, and prevents over- or under- utilization, controls cost
Cost vs performance	Trade-off between resources and cost	Cuts the costs while ensuring performance

Scaling decisions often depend on workload characteristics, such as whether the system is read-heavy or write-heavy. For example, in the case of a write-intensive workload on a database like MongoDB Atlas, autoscaling ensures the primary node can handle increased writes, thus saving time and achieving cost efficiency.

With predictive scaling, machine language algorithms forecast when demand is likely to spike based on historic usage data. This allows autoscaling to adjust resources ahead of time, ensuring high availability, optimal performance, and a seamless user experience even during sudden traffic surges.

Configuring autoscaling for MongoDB workloads

MongoDB Atlas provides manual and automatic ways for scaling in multiple availability zones, and offers reactive and predictive autoscaling support.

Horizontal scaling in MongoDB is done manually using sharding, where the database is divided into parts (shards) and incoming traffic is routed to shards by the mongos query router (similar to load balancer). The shards can be vertically scaled depending on the demand on the particular shard.

In vertical scaling, the secondary node is usually taken offline with its network storage detached. The server is scaled up, then storage is reattached. Once the scaled-up server is restarted and catches up with the primary, the same process is repeated for the other secondaries as well as the primary node.

This is where predictive scaling (vertical) can help. Predictive scaling algorithms correlate the CPU/IOPS per cluster with the traffic load and get the slope of the correlation line. Based on this, the future demand can be estimated and sufficient resources can be up well before the demand surges. Learn more about predictive autoscaling in MongoDB.

How predictive autoscaling helps prevent resource overloading

Conclusion

Autoscaling is a cloud capability that dynamically adjusts resources to match workload demands, improving performance and reducing the need for manual intervention. While vertical scaling increases the capacity of existing resources, horizontal scaling adds more instances.

A load balancer complements autoscaling by distributing incoming requests across available resources. In MongoDB, horizontal scaling is achieved through sharding, which distributes data across multiple nodes. MongoDB simplifies scaling with built-in autoscaling features, primarily focused on vertical scaling of cluster tier and storage, easily configured through the MongoDB Atlas user interface.

How Coinbase Achieved Rapid Autoscaling with MongoDB — Deep dive into the successful use case of autoscaling using MongoDB Atlas.
Basics of Database Scaling — Learn what database scaling is and explore related concepts.
Scalability in MongoDB Atlas — Find out how scalability is achieved in MongoDB Atlas.
Predictive Autoscaling in MongoDB — Discover more on MongoDB predictive scaling.
A Guide to Horizontal vs. Vertical scaling — Understand the core differences and features of horizontal and vertical scaling.

FAQs

Database autoscaling dynamically increases or decreases capacity (CPU, RAM, storage, or connections) to handle varying query loads. In MongoDB Atlas, enabling cluster tier and storage autoscaling options automatically expands resources when query traffic spikes.

A load balancer distributes incoming traffic across existing resources, while autoscaling adds or resizes resources when demand exceeds current capacity.

An autoscaler monitors metrics like CPU, memory, IOPS, and automatically triggers scaling actions based on thresholds, schedules, or predictive patterns.

Get started with Atlas today

Get started in seconds. Our free clusters come with 512 MB of storage so you can play around with sample data and get oriented with our platform.

Try FreeContact sales

GET STARTED WITH:

125+ regions worldwide
Sample data sets
Always-on authentication
End-to-end encryption

Command line tools

The Basics of Autoscaling: What Is Autoscaling?

Key takeaways

Why autoscaling matters

Autoscaling with MongoDB Atlas

Autoscaling basics