MongoDB & IIoT: Turning Data into Business Intelligence

Joy Ike
October 27, 2022 | Updated: September 9, 2024
#IoT

>> Announcement: Some features mentioned below will be deprecated on Sep. 30, 2025. Learn more.

Manufacturing companies leverage business intelligence (BI) to sift through and analyze manufacturing and supply chain data in order to become more efficient and productive organizations. Often, the real hurdle with analytics is ensuring reliable access to relevant data sets. This article describes how to prepare data to yield strategic and operational insights through a combination of data tiering, federation, querying, and visualization.

Consider the scenario of a car manufacturer looking to implement a predictive maintenance program to reduce maintenance costs for its car assembly machines. Establishing an optimal data storage infrastructure is critical to allow them to find correlations between live IoT sensor data and historical maintenance records, thereby gaining insights into maintenance trends and correlating sensor data.

As shown in Figure 1, such a challenge falls under step 3 of our IIoT end-to-end data integration framework: Compute.

Graphic of the 4 step process for end-to-end integration; which includes connect, collect, compute, and create. This graphic highlights step 3, compute. — Figure 1: Step 3 in end-to-end data integration framework for IIoT.

Read the first, second, and third articles in this series on end-to-end data integration in the context of IIoT.

Visualization of the MongoDB Atlas Data Platform. The graphic lists data sources (Sensor Data, Inventory Data, Product Data, and other) and lists some of the main features that are a part of Atlas (Online Archive, Data Lake, and Object Store). — Figure 2: Architecture overview of data visualization and analytics enabled by MongoDB.

The proposed methodology leverages the different data tiering capabilities of MongoDB covering the full data lifecycle to create a single API access for BI/analytics. Figure 2 summarizes the different MongoDB features and third-party integrations available to take advantage of the volumes of data generated over time for data-driven business insights.

The challenge of data tiering

The car manufacturer in our example would most likely need to differentiate between the different types of data needed for its predictive maintenance model. Here we make a distinction between operational and analytical workloads.

Operational workload: Refers to latency-sensitive data that affects functioning of equipment or powers critical applications/processes.
Analytical workload: Refers to life and historical data that does not power mission-critical applications but is readily stored and queried for the purpose of reporting, analytics, or training of AI/ML models.

Figure 3 provides a basic illustration of how MongoDB handles workload isolation leveraging MongoDB replica sets to support real-time BI and analytical workloads without additional ETL jobs.

Illustration of workload isolation in MongoDB. Industrial Applications flow into Operational Workload, which goes into Real Time Analytical Workloads. Operational workloads include the primary and secondary nodes while Real Time Analytical Workloads utilize the secondary nodes. — Figure 3: Illustration of workload isolation in MongoDB.

More advanced architecture patterns for workload isolation or data tiering can be achieved through sharding. Although these approaches are suitable for many scenarios, they are still more like hot/warm data because storage and compute are still tightly coupled.

For maximum cost efficiency at the expense of latency, we must consider newer cloud storage options, such as Amazon S3 or other Blob stores, which decouple storage and compute and are perfectly suited to store so-called cold data. The challenge, however, is how to extract the data from hot stores (such as MongoDB), bring it into the cold storage (such as S3) while maintaining the ability to query the data through a single API.

MongoDB provides several options to facilitate fully automated data tiering, including:

Online Archive: Rule-based data archiving

Online Archive in MongoDB Atlas provides an automated rule-based mechanism for moving data out of live/hot clusters to more cost-effective/cold storage (for example, Amazon S3 buckets). This feature removes the burden of building and maintaining potentially complex ETL jobs and data purging functionality while allowing users to configure data offloading within a few simple configuration steps.

Online Archive moves data based on criteria specified in archival rules (as shown in Figure 4). In our example of an auto manufacturing company, sensor data is an excellent use case for this type of data tiering. Sensor data is “hot” when it's created and cools down over time with less need for real-time queries. Our car manufacturer can easily configure an archival rule dependent on the timestamp and in combination with the number of days they want to keep the data in the MongoDB cluster.

Gif of how Online Archive works, showing the transfer of data between Atlas Cluster, Online Archive, and Applications — Figure 4: Animation showing how Online Archive works.

A broad set of MongoDB Atlas customers across industries already uses Online Archive to save storage costs while maintaining query ability across hot and cold data.

With Online Archive, we were able to save an astounding 60% in data storage costs and 70% in cloud backup costs — reducing our overall database spend by 35%.
Martin Löper, Cloud Solutions Architect, Nesto Software

Although offloading data already provides major cost savings, there is also potential for more efficient data processing on the consumer side by optimizing the data structures and file formats toward more column-oriented analytical queries. For this purpose, MongoDB has recently released a Data Lake feature set (currently in Preview) that allows users to take advantage of new features such as columnar indexing and an optimized analytical file format.

Data Lake: Columnar indexing of database snapshots

Data Lake is MongoDB’s offering of a fully managed analytical storage solution that provides the economics of cloud object storage and is optimized for high-performing analytical queries. It works by reformatting data from a backup snapshot of the Atlas cluster and creating partitioned indexes (illustrated in Figure 5).

Illustration showing how data lake works. — Figure 5: Diagram showing how Data Lake works.

Fully integrated as part of MongoDB Atlas, Atlas Data Lake is provisioned alongside Atlas clusters with no infrastructure to set up or manage, and no storage capacity to predict, making the user experience, administration, and support easy. Returning to our example of predictive maintenance model development, performing columnar indexing on the collected data will result in high gains for analytical query performance.

Data Federation: Data virtualization made simple

Rarely do business analysts have all the required data in the same place. Often, it’s distributed among different domains and data stores as well as in different formats, like JSON, tabular, CSV, Parquet, Avro, and others. This leads to quite a complex landscape with different API languages, which makes it hard to get easy access to data across all these sources. That's where MongoDB's Atlas Data Federation comes in.

Data Federation allows bridging of these data silos by consolidating all the discussed data sources behind a single API without the need for data duplication (Figure 6). Users can group different data sources to virtual databases and collections and query the data with MQL or SQL across the various sources just like talking to a single DBMS. This approach reduces the effort, time-sink, and complexity of pipelines and ETL tools when working with data in different formats. It also allows users to seamlessly query, transform, and aggregate data from one or more data stores (i.e., Atlas cluster, Atlas Data Lake, Amazon S3 buckets, Online Archive, and HTTP endpoints) to create a single virtual database using the full power of the aggregation pipeline (Figure 7).

Diagram of how Data Federation works in MongoDB Atlas. Atlas Clusters, Online Archive, Atlas Data Lake, AWS S3, and HTTP's connect to Atlas Data Federation. The inputs go into the federated database instance , which includes the virtual database and virtual collections. The Data Federation also includes aggregation pipelines, federated query, and $out. — Figure 6: Diagram showing how Data Federation works in MongoDB Atlas.

Figure 7: Creating a virtual database in the MongoDB Atlas GUI.

Please refer to the documentation for a more detailed description of the process of creating a Federated Database Instance in MongoDB Atlas.

Data Federation endpoints are not just read-only APIs. Results of querying a federated database instance can be stored back in MongoDB clusters or as files in S3 buckets to power other real-time enterprise or end-user applications, or for performing other analytical tasks and visualizations. In the case of our car manufacturer, real-time sensor data and maintenance history can be queried together and made available to an analytical engine training ML models for remaining useful life prediction.

The fastest way to start building compelling visualizations and gaining insight into the data across MongoDB clusters and file-based data sources through federated instances is through the use of Charts, which comes fully integrated in the Atlas product suite.

Data visualization with Charts

Charts provides a quick, simple, and yet powerful way to visualize data with multiple widgets, dynamic filters, and automatic data refresh like you know it from traditional BI tools. Atlas users can connect dashboards created in Atlas Charts with federated databases and perform correlation analytics in a no-code environment.

Charts is fully integrated with the MongoDB Atlas product suite, which means that data sources in Atlas are immediately accessible from the interface, allowing users to add federated databases as a source for a variety of dashboard visualizations. From displaying device sensor data to calculated values for more sophisticated insights, Charts provides widgets and custom fields calculations to achieve effective and insightful visualizations.

Figures 8 and 9 show two examples of dashboards created in Charts showing time series sensor data from a smart factory and Overall Equipment Effectiveness (OEE) along with other manufacturing performance metrics information. Through the use of these powerful visualizations, the car manufacturer can understand the effect of optimal maintenance strategies on overall factory performance.

Sample shop floor monitoring dashboard created in Atlas Charts. Data is visualization by graphs with separate charts for Humidity over time, room temperature over time, air quality, air pressure, gas resistance, and LDR over time. — Figure 8: Sample shop floor monitoring dashboard created in Atlas Charts.

Sample OEE dashboard in Atlas Charts. The dashboard utilizes different visualization tools, such as line graphs, bar graphs, and half-circle charts. — Figure 9: Sample OEE dashboard in Atlas Charts

To harness existing knowledge and skills around familiar and popular BI tools such as Power BI and Tableau, MongoDB has developed Atlas SQL API, which gives users the option to connect SQL-based business intelligence and analytics tools to Atlas through a variety of drivers and connectors including:

Tableau Connector
Power BI Connector
JDBC Driver
ODBC Driver

These Atlas SQL connectors and drivers leverage Data Federation functionality, thereby enabling users to query data across Atlas clusters and cloud storage (such as S3 buckets) and to maintain the comfort of existing SQL-based BI tools that they are familiar with.

Getting started is easy using the Atlas SQL API at no cost with the detailed tutorial and the documentation. Register for a free Atlas user account to try it out.

Thank you to Karolina Ruiz Rogelj for her contributions to this post.

Watch our recorded webinar to see a live demonstration of how Atlas Federated Instances are created and used as a data source for MongoDB Charts and Tableau.

← Previous

Built With MongoDB: ChargeHub Simplifies the Electric Charging Experience

While the market for electric vehicles (EVs) continues to expand, several barriers to adoption continue to prevent buyers from making the switch. One of the top concerns among potential buyers is access to charging stations. Currently, there are more than 150,000 charging stations in the U.S. and Canada, but you wouldn't know it by driving along a highway or through a densely populated area. That's because unlike gas stations, charging stations are not advertised on the side of highways or with huge commercial signs. Quebec-based startup, ChargeHub , is out to solve this problem. "ChargeHub's mission is to simplify the electric vehicle charging experience," says ChargeHub Co-founder and CTO, Olivier Proulx. "If we simplify it enough, it will increase electric vehicle adoption." With the ChargeHub app, EV owners can locate charging stations anywhere in North America and know if they're available for charging. ChargeHub is also a member of the MongoDB for Startups program, which helps startups build faster and scale further with free MongoDB Atlas credits, one-on-one technical advice, co-marketing opportunities, and access to a vast partner network. Company origins Even though the app is the company's main focus today, it's not how ChargeHub started. Proulx says he and the co-founders worked as consultants in the EV space before EV cars were even on the road. "We were building electric off road vehicles,'' Proulx says. "Clients were asking us, where are the charging stations in Canada? And we thought the easiest thing to do was to build an app that would show that." After putting the app on the app store for free, the number of downloads convinced them that knowing the location of charging stations was a real problem. So, they diverted their attention away from consulting and started putting more effort into the ChargeHub app. Evolving the app It's not just finding a charging station that EV owners find problematic. When you get to a charging station, you need to pay for charging. The market is highly fragmented, Proulx says. In North America, there are currently over 35 operators of charging stations. When you go to use a new one, you have to sign up for an account with the charging operator and deposit funds into the account to pay for charging. With so many different operators, you wind up with multiple accounts, each with a balance. "With ChargeHub," Proulx says, "you can create one account and charge at over 70% of the charging stations in North America." Today, when EV owners find a station using the ChargeHub app, they can find out if there's a port available before they get there. And, once they start a charging session, the app shows that the charging session has started. That's a lot of transactions that have to happen in real time to ensure a seamless user experience. Proulx says MongoDB Atlas , when he compared it against other databases, gave them the performance they needed at a cost that made the decision easy. Building with MongoDB ChargeHub Co-founder and CTO, Olivier Proulx, describes the EV charging experience for attendees at MongoDB World 2022. Proulx says that the choice to build with MongoDB Atlas from the beginning was critical to its early success. "MongoDB Atlas helped us get the product up and running on a stable, scalable platform from day one," Proulx says. "We didn't have to worry about having to migrate later. And it helped us prove our concept without having to spend too much." Getting free Atlas credits from the MongoDB for Startups program also helped. "When you're building a product and going to market, you're trying to save every penny that you can and extend your runway," Proulx says. The security of Atlas was another key consideration. "Having industry-standard security was critical because we work with electric utilities that are very strict on security," Proulx says. "With MongoDB Atlas, being able to check that box from day one was really critical." Like a lot of startups, the ChargeHub team had to be strategic about where it focused its resources. Managing a database was not part of that strategy. "We were a small team, we didn't want to have to run our own hardware, we wanted everything in the cloud as a service," Proulx says. "Being able to focus on building our solution instead of running things was critical for us. And being able to pick our cloud provider was helpful in managing costs." Cloud flexibility was a big factor for the ChargeHub team according to Proulx: "MongoDB makes it really seamless to pick your cloud provider. And they work with all the main cloud providers. It makes our security policies easier to maintain." Leveraging the cloud depends on how well you're able to integrate it into your existing tech stack. MongoDB scored high marks in that regard. "Our tech stack is based on Node.js and JavaScript. The connection with MongoDB and the document model was so seamless," Proulx says. "Even the Query API fits so well with Node and JavaScript. So for us, it was a no-brainer to go with MongoDB." The road ahead ChargeHub's goal is to reach 100% coverage of charging stations in North America. As EV infrastructure expands, and as more people know that a charging station is never that far away, Proulx says people will be less reluctant to choose an EV for their next car. If the feedback he gets from his users is any indication, new EV buyers don't have anything to worry about. "By having an app and a consumer product, you get feedback from your users," Proulx says. "It's so fun to hear from our users who go on road trips and use ChargeHub to go see the mountains and charge on the way. They're so happy they can finally use one app to charge anywhere they want." If you're looking into an electric vehicle or you already have one, download the ChargeHub app for iOS or Android . Or you can try the all new web experience designed specifically for Tesla drivers to use in the Tesla browser. And be sure to reach out to the ChargeHub support channel if you have feedback. They're always looking to improve the app experience. Are you part of a startup and interested in joining the MongoDB for Startups program? Apply now . For more startups content, check out our Built With MongoDB blog collection.

October 26, 2022

Next →

MongoDB 8.0: Eating Our Own Dog Food

Key Takeaways We achieve real-world testing by adopting release candidates (RCs) on our internal production systems before finalizing a release. Our diverse internal workloads delivered unique insights. For instance, an internal cluster’s upgrade identified a rare MongoDB server crash and an inefficiency for a specific query shape introduced by a new MongoDB 8.0 feature. Issues encountered while testing MongoDB 8.0 internally were fixed proactively before they went out to customers. For example, during an upgrade to an 8.0 RC, one of our internal databases crashed and the issue was fixed in the next RC. Prerelease testing uncovered gaps in our automated testing, leading to improved coverage with additional tests. Using MongoDB 8.0 internally on mission-critical internal systems demonstrated its reliability. This gave customers confidence that the release could handle their demanding workloads, just as it did for our own engineering teams. Release jitters Every software release, whether it’s a new product or an update of an existing one, comes with an inherent risk: what if users encounter a bug that the development team didn’t anticipate? With a mission-critical product like MongoDB 8.0 , even minor issues can have a significant impact on customer operations, uptime, and business continuity. Unfortunately, no amount of automated testing can guarantee how MongoDB will perform when it lands with customers. So how does MongoDB proactively identify and resolve issues in our software before customers encounter them, thereby ensuring a seamless upgrade experience and maintaining customer trust? Catching issues before you do To address these challenges, we employ a combination of methods to ensure reliability. One approach is to formally model our system to prove the design is correct, such as the effort we undertook to mathematically model our protocols with lightweight formal methods like TLA+. Another method is to prove reliability empirically by dogfooding. Dogfooding (🤨)? Eating your own dog food—aka eating your own pizza, aka “dogfooding”—refers to a development process where you put yourself in customers’ shoes by using your own product in your own production systems. In short: you’re your own customer. Why dogfood? Enhanced product quality: Testing in a controlled environment can’t replicate the edge cases of true-to-life workloads, so real-world scenarios are needed to ensure robustness, reliability, and performance under diverse conditions. Early identification of issues: Testing internally surfaces issues earlier in the release process, enabling fixes to be deployed proactively before customers encounter them. Build customer empathy: Acting as users provides direct insight into customer pain points and needs. Engineers gain firsthand understanding of the challenges of using their product, informing more customer-centric solutions. Without dogfooding, things like upgrades are taken for granted and customer pain points can be overlooked. Boost credibility and trust: Relying on our own software to power critical internal systems reassures customers of its dependability. Dogfooding at MongoDB MongoDB has a strong dogfooding culture. Many internal services are built with MongoDB and hosted on MongoDB Atlas , the very same setup we provide our customers. Eating our own dog food is essential to our customer mindset. Because internal teams work alongside MongoDB engineers, acting as users bridges the gap between MongoDB engineers and their customers. Additionally, real-life workloads vet our software and processes in a way automated testing cannot. Release dogfooding With the release of MongoDB 8.0, the company decided to take dogfooding one step further. Driven by a company-wide focus on making 8.0 the most performant version of MongoDB yet, we embarked on an ambitious plan to dogfood the release candidates within our own infrastructure. Before, our release process looked like this: Figure 1. Releases without real-world testing. We wanted it to look more like this: Figure 2. Releases pregamed on internal clusters. Adding internal testing to the release process allows us to iterate long before we make the product available to customers. Whereas in the past we’d release and fix issues reactively as customers encountered them, using the release internally, before it got into customers’ hands, would uncover edge cases so we could fix them proactively. By acting as our own customers, we remove our real customers from the development cycle and build confidence in the release. The confidence team To tackle upgrades effectively, we assembled a cross-functional team of MongoDB engineers, Atlas SREs, and internal service developers. A technical program manager (TPM) was assigned to the effort to track progress and coordinate efforts across the team. Together, we enumerated the databases, scheduled upgrade dates, and assigned directly responsible individuals (DRIs) to each upgrade. To streamline communication, we created an internal Slack channel and invited everyone on the team to it. We agreed on a playbook: with the support of the team, the assigned DRI would upgrade their cluster and monitor for any issues. If something came up we would create a ticket in an internal Jira project and mention it in Slack for visibility. I took on the role of DRI for Evergreen database upgrades. Evergreen My team maintains the database clusters for Evergreen , MongoDB’s bespoke continuous integration (CI) system. Evergreen is responsible for running automated tests at scale against MongoDB, Atlas, the drivers, Evergreen itself, and many other products. At last count, each day Evergreen executes, in parallel, roughly ten years of tests per day and is on the critical path for many teams at the company. Evergreen runs on two separate clusters in Atlas: the application’s main replica set and a smaller one for our background job coordinator, Amboy . In terms of scale, the main replica set contains around 9.5TB of data and handles 1 billion CRUD operations per day, while the Amboy cluster contains about 1TB of data and handles 100 million CRUD operations per day. Because of Evergreen’s criticality to the development cycle, historically we’ve taken a cautious approach to any operational changes and database upgrades were not a priority. The initiative to dogfood our internal clusters changed our approach—we were going to use 8.0 before it went out to customers. Enabling a feature flag in Atlas made the RC build available in our Atlas project before it was available to customers. A showstopper Our first target was the Amboy cluster, which handles background jobs for Evergreen. I clicked the button to upgrade our Amboy cluster and we held our collective breath. Atlas upgrades are rolling. This means an upgrade is applied iteratively to each secondary in the cluster until finally the primary is stepped down and upgraded. Usually this works well since any issues will at most affect just a secondary, but in our case it didn’t work out. The secondaries’ upgrades succeeded, but when the primary was stepped down, each node that won the election to be the next primary crashed. The result was that our cluster had no primary and the Amboy database was unavailable, which threw a monkey-wrench in our application. We sounded the alarm and an investigation commenced ASAP. Stack traces, logs, and diagnostics were captured and the cluster was downgraded to 7.0. As it turned out, we’d hit an edge case that was triggered by a malformed TTL index specification with a combination of two irregularities: Its expireAfterSeconds was not an integer. It contained a weights field , which is not valid in an index that’s not a text index . Both irregularities were previously allowed, but became invalid due to strengthened validation checks. When a node steps up to primary, it corrects these malformed index specifications, but in that 8.0 RC if there were two things wrong with an index it would go down an execution path that ended in a segfault. This bug only occurs when a node steps up to primary, which is why it brought down our cluster despite the rolling upgrade. SERVER-94487 was opened to fix the bug and the fix was rolled into the next RC. When the RC was ready, we upgraded the Amboy database again and the upgrade succeeded. Not a showstopper Next up was the main database cluster for the Evergreen application. We performed the upgrade, and at first all indications were that the upgrade was a success. However, on further inspection a discontinuous jump had appeared in two of the Atlas monitoring graphs. Before the upgrade our Query Executor graph usually looked like this: Figure 3. Query Executor graph before the upgrade. Whereas after the upgrade it looked like this: Figure 4. Query Executor graph after the upgrade. This represented roughly a 5x increase in the rate per second of index keys and documents scanned by queries and query plans. Similarly, the Query Targeting graph looked like this before the upgrade: Figure 5. Query Targeting graph before the upgrade. Whereas after the upgrade it looked like this: Figure 6. Query Targeting graph after the upgrade. This also represented roughly a 5x increase to the ratio of scanned index keys and documents to the number of documents returned. Both these graphs indicated there was at least one query that wasn’t using indexes as well as it had been before the upgrade. We got eyes on the cluster and it was determined that a bug in index pruning (a new feature introduced in 8.0) was causing the query planner to remove the most efficient index for a contained $or query shape. This is when a query contains an $or branch that isn’t the root of the query predicate, such as A and (C or B) . For the 8.0 release this was listed as a known issue and disabled in Atlas, and index pruning was disabled entirely by the 8.0.1 release until we can fix the underlying issue in SERVER-94741 . Other clusters Other teams’ clusters followed suit, but their upgrades went off without a hitch. It’s to be expected that the particulars of each dataset and workload would trigger various edge cases. Evergreen’s clusters hit some while the rest did not. This brings out an important lesson: testing against a variegated set of live workloads raises the likelihood we’ll encounter and address all the issues our customers would have encountered. Continuous improvement Although we caught these issues before they reached customers, our shift-left mindset motivates us to catch them earlier in the process through automated testing. As part of this effort, we plan to add additional tests focused on upgrades from older versions of the database. The index pruning issue, in particular, was part of the inspiration for us to investigate property based testing –an approach that has already uncovered several new bugs ( SERVER-89308 ). SERVER-92232 will introduce a property based test specifically for index pruning. What’s next? All told, the exercise was a success. The 8.0 upgrade reduced Evergreen’s operation execution times by an order of magnitude: Figure 7. Drastically faster database operations after the upgrade. For customers, dogfooding uncovered novel issues and gave us the chance to fix them before they could disrupt customer workloads. By the time we cut the release we were confident we were providing our customers a seamless upgrade. Through the dogfooding process we discovered additional internal teams with services built on MongoDB. And now we’re further leaning in on dogfooding by building out a formal framework that will include those teams and their clusters. For the next release, this will uncover even more insights and provide greater confidence. Looking ahead, as our CTO aptly put it , "all customers demand security, durability, availability, and performance" from their technology. Our commitment to eating our own dogfood directly strengthens these very pillars. It's a commitment to our customers, a commitment to innovation, and a commitment to making MongoDB the best database in the world. Join our MongoDB Community to learn about upcoming events, hear stories from MongoDB users, and connect with community members from around the world.

March 3, 2025

← Previous

Built With MongoDB: ChargeHub Simplifies the Electric Charging Experience

October 26, 2022

Next →

MongoDB 8.0: Eating Our Own Dog Food

March 3, 2025