Scaling to meet Coinbase’s volatile traffic demand
While the cryptocurrency market can be volatile and unpredictable, leading exchanges must provide seamless performance irrespective of traffic levels. When Coinbase — one of the largest cryptocurrency exchanges in the United States — wanted to improve the user experience, its first step was to optimize its MongoDB database clusters to reduce downtime.
When Coinbase launched in 2012, it chose to use MongoDB hosted on Amazon Web Services. Since then, the company has grown from a few small database clusters to a fleet of roughly 700 clusters and has migrated to MongoDB Atlas with Atlas Data Federation powering its data warehousing pipeline.
Unlike traditional fintech companies, cryptocurrency exchanges are vulnerable to highly volatile and unexpected traffic. During periods of heavy demand, Coinbase’s largest clusters could take over an hour to scale. For the fast-moving cryptocurrency market, this lead time is unacceptable, as users suffer due to the performance degradation.
“When we see big traffic spikes, we need to scale our clusters as quickly as possible,” said Sean Hurley, Staff Software Engineer at Coinbase. “We can’t always be provisioned for peak capacity; we need to dynamically scale up and down with our traffic needs.”
Building a predictive scaling solution with MongoDB Atlas
Sean Hurley, Staff Software Engineer
Next, Coinbase built a machine learning (ML) model to predict traffic volatility using cryptocurrency price data. The model automatically triggers the clusters to scale up 60 minutes prior to anticipated traffic spikes. “With the heads-up from our ML model, we can have all our clusters scaled up before traffic surges,” said Hurley.
Finally, Coinbase and MongoDB introduced parallel scaling to reduce the time between triggering clusters to scale and reaching peak performance. Previously, scaling went node by node. Now, similar node types are grouped and scaled at the same time. This accelerated new cluster deployment times by up to 3.25 times.
Seamlessly handling crypto volatility
“We’ve brought scaling times down from upward of 70 minutes to 25 minutes — less than half the time it used to take,” said Hurley. This speed increase falls well within the predictive model’s advance traffic notice. Coinbase’s platform is now more resilient, underpinning a great user experience for its customers despite unpredictable volatility.
As a result, Coinbase’s end users gained a more seamless experience. Previously, traffic spikes could impact some parts of the Coinbase app. Now, users don’t even notice changes happening behind the scenes.
Coinbase’s next phase is optimization. It is considering sharding to address vertical scaling limits and building tooling to catch bad queries before they hit production.
Sean Hurley, Staff Software Engineer