ATD is working with MongoDB and Google Cloud to solve a universal problem: how to unleash the massive amounts of data locked up in legacy data stores on-premise, said Chatterjee. “We have years’ worth of legacy data. Our challenge was, how can we unlock it to drive the business forward?” Technology teams at ATD are on the road to the digital transformation of their business through highly performant distributed microservices-based systems connected via APIs. Doing so with a legacy relational database would require the immensely difficult proposition of getting such a database to stream data across a high volume of transactions in near real time. This new model also demands a hybrid cloud architecture spanning on-premise data centers through to public clouds. The data integration, too, would pose serious challenges to a relational database.
Suryadeep Chatterjee, Senior Director of Enterprise Architecture, Integration & Automation Technologies, ATD
ATD’s legacy systems kept data locked in silos, making access difficult, and forced it to integrate data from an on-premises data center to the cloud and back again. It had no capability to process streaming data and store it for accessibility. And ATD’s desire to support data in a multi-cloud platform environment was out of the question. All of these constraints were keeping it from its goal of creating required information systems to support the tire supply chain of tomorrow.
Technology team at ATD sought real-time asynchronous data sync to multiple applications so each system can use data without taxing the transactional write-heavy system. Since it might not need the full data sets coming out of the transactional system, it needed transformation capabilities. It needed to flow data with domains in mind to a microservices-based architecture. A data lineage tool was important because with millions of transactions flowing through the system, if a single record errors out they need to pinpoint it and fix both the problem and the underlying data. In short, the team needed a solution that was event driven, distributed, and performant: “A solution that would not bow down to data size,” said Chatterjee, “and one that makes your APIs perform better with millisecond response time.”
The team knew what it needed on the technology side. On the business side, the demands of its growing business for reliable, always-on data meant that “Resilience, scalability, and efficiency of the platform have to be top-notch; we cannot compromise there at all,” said Chatterjee. Another ask from the business was that the platform blend into the existing data ecosystem, such that transactional systems, business applications, analytical systems, and others have easy access to data. “We needed to emerge from the transactional models of the legacy data store where to fetch a record we rely on direct connection, and in event of heavy loads take key systems down,” said Chatterjee. “So, the new platform had to be distributed and event-driven, integrate well with modern data stores, and have no single point of failure.”
The team determined that its ideal system did not exist off the shelf. So it built one.
ATOM meets the technology and business asks ATD had set forth for itself:
Figure 2 illustrates data flow in ATOM.
ATD’s change data capture process runs on its on-prem relational database reading real-time data and publishing to a Kafka cluster on Google Cloud. Data streams deployed on Google Cloud containers consume data from Kafka, transform it, and load it into MongoDB Atlas. Data access domain services read data from MongoDB and present it to calling applications. Every process is independent and follows microservices architecture. For example, the Order domain has its own data stream and separate database in Atlas along with its own data access service. Distributing data across domains and clusters in MongoDB enhances performance and resilience.
ATOM deploys data streams in the Google Kubernetes Engine (GKE). A user view of the ATOM Data Stream Grafana dashboard, shown in Figure 3, gives an overview of ATD’s Data Stream cluster.
MongoDB Atlas is a fully managed service on Google Cloud that customers can now procure through Google Cloud Marketplace. Atlas is available in all Google Cloud regions (currently 24) to support truly global applications.
“When we first started, we were running self-managed MongoDB on GKE, then we migrated to MongoDB Atlas,” said Chatterjee. “The whole migration process was so simple, and we were able to do it quickly for two main reasons: the ability of Atlas to easily backup and restore data, and the ability of our event-driven, distributed architecture to help replace pieces of the solution in-flight.” MongoDB Atlas delivers the event driven, distributed, and performant solution the team was seeking: one that indeed “does not bow down to data size,” and makes ATD’s APIs perform better with millisecond response time.
ATD runs 12 MongoDB Atlas clusters supporting development, quality assurance testing, performance testing, and production. “ATOM comprises more than 70 deployed data streams retrieving real-time data from more than 30 Kafka topics and streaming the data to 12 MongoDB Atlas databases within 11 data domains. We are streaming approximately 11 million real-time transactions per day,” said Chatterjee. “We can now quickly scale up for bulk one-time loads and process more than 50 million transactions in less than four hours. We can load more than 50 million records within an organization's weekend maintenance window. On top of MongoDB Atlas, we currently have more than 50 data access services retrieving information from more than 50 collections and transmitting data to our front end at a rate of 21,000 transactions per second.”
Suryadeep Chatterjee, Senior Director of Enterprise Architecture, Integration & Automation Technologies, ATD