Explore Developer Center's New Chatbot! MongoDB AI Chatbot can be accessed at the top of your navigation to answer all your MongoDB questions.

MongoDB Developer
Atlas
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
Atlaschevron-right

IoT and MongoDB: Powering Time Series Analysis of Household Power Consumption

Nenad Milosavljevic6 min read • Published Aug 28, 2024 • Updated Aug 28, 2024
JavaScriptAtlas
Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
IoT (Internet of Things) systems are increasingly becoming a part of our daily lives, offering smart solutions for homes and businesses.
This article will explore a practical case study on household power consumption, showcasing how MongoDB's time series collections can be leveraged to store, manage, and analyze data generated by IoT devices efficiently.

Time series collections

Time series collections in MongoDB effectively store time series data — a sequence of data points analyzed to observe changes over time.
Time series collections provide the following benefits:
  • Reduced complexity for working with time series data
  • Improved query efficiency
  • Reduced disk usage
  • Reduced I/O for read operations
  • Increased WiredTiger cache usage
Generally, time series data is composed of the following elements:
  • The timestamp of each data point
  • Metadata (also known as the source), which is a label or tag that uniquely identifies a series and rarely changes
  • Measurements (also known as metrics or values), representing the data points tracked at increments in time — generally key-value pairs that change over time

Case study: household electric power consumption

This case study focuses on analyzing the data set with over two million data points of household electric power consumption, with a one-minute sampling rate over almost four years.
The dataset includes the following information:
  • date: Date in format dd/mm/yyyy
  • time: Time in format hh:mm
  • global_active_power: Household global minute-averaged active power (in kilowatt)
  • global_reactive_power: Household global minute-averaged reactive power (in kilowatt)
  • voltage: Minute-averaged voltage (in volt)
  • global_intensity: Household global minute-averaged current intensity (in ampere)
  • sub_metering_1: Energy sub-metering No. 1 (in watt-hour of active energy); corresponds to the kitchen, containing mainly a dishwasher, an oven, and a microwave (hot plates are not electric but gas-powered)
  • sub_metering_2: Energy sub-metering No. 2 (in watt-hour of active energy); corresponds to the laundry room, containing a washing machine, a tumble drier, a refrigerator, and a light.
  • sub_metering_3: Energy sub-metering No. 3 (in watt-hour of active energy); corresponds to an electric water heater and an air conditioner

Schema modeling

To define and model our time series collection, we will use the Mongoose library. Mongoose, an Object Data Modeling (ODM) library for MongoDB, is widely used in the Node.js ecosystem for its ability to provide a straightforward way to model our application data.
The schema will include:
  • timestamp: A combination of the “date” and “time” fields from the dataset.
  • global_active_power: A numerical representation from the dataset.
  • global_reactive_power: A numerical representation from the dataset.
  • voltage: A numerical representation from the dataset.
  • global_intensity: A numerical representation from the dataset.
  • sub_metering_1: A numerical representation from the dataset.
  • sub_metering_2: A numerical representation from the dataset.
  • sub_metering_3: A numerical representation from the dataset.
To configure the collection as a time series collection, an additional “timeseries” configuration with “timeField” and “granularity” properties is necessary. The “timeField” will use our schema’s “timestamp” property, and “granularity” will be set to “minutes” to match the dataset's sampling rate.
Additionally, an index on the “timestamp” field will be created to enhance query performance — note that you can query a time series collection the same way you query a standard MongoDB collection.
The resulting schema is structured as follows:
1const { Schema, model } = require('mongoose');
2
3const powerConsumptionSchema = new Schema(
4 {
5 timestamp: { type: Date, index: true },
6 global_active_power: { type: Number },
7 global_reactive_power: { type: Number },
8 voltage: { type: Number },
9 global_intensity: { type: Number },
10 sub_metering_1: { type: Number },
11 sub_metering_2: { type: Number },
12 sub_metering_3: { type: Number },
13 },
14 {
15 timeseries: {
16 timeField: 'timestamp',
17 granularity: 'minutes',
18 },
19 }
20);
21
22const PowerConsumptions = model('PowerConsumptions', powerConsumptionSchema);
23
24module.exports = PowerConsumptions;
For further details on creating time series collections, refer to MongoDB's official time series documentation.

Inserting data to MongoDB

The dataset is provided as a .txt file, which is not directly usable with MongoDB. To import this data into our MongoDB database, we need to preprocess it so that it aligns with our database schema design.
This can be accomplished by performing the following steps:
  1. Connect to MongoDB.
  2. Load data from the .txt file.
  3. Normalize the data and split the content into lines.
  4. Parse the lines into structured objects.
  5. Transform the data to match our MongoDB schema model.
  6. Filter out invalid data.
  7. Insert the final data into MongoDB in chunks.
Here is the Node.js script that automates these steps:
1// Load environment variables from .env file
2require('dotenv').config();
3
4// Import required modules
5const fs = require('fs');
6const mongoose = require('mongoose');
7const PowerConsumptions = require('./models/power-consumption');
8
9// Connect to MongoDB and process the data file
10const processData = async () => {
11 try {
12 // Connect to MongoDB using the connection string from environment variables
13 await mongoose.connect(process.env.MONGODB_CONNECTION_STRING);
14
15 // Define the file path for the data source
16 const filePath = 'Household_Power_Consumption.txt';
17
18 // Read data file
19 const rawFileContent = fs.readFileSync(filePath, 'utf8');
20
21 // Normalize line endings and split the content into lines
22 const lines = rawFileContent.replace(/\r\n/g, '\n').replace(/\r/g, '\n').trim().split('\n');
23
24 // Extract column headers
25 const headers = lines[0].split(';').map((header) => header.trim());
26
27 // Parse the lines into structured objects
28 const parsedRecords = lines.slice(1).map((line) => {
29 const values = line.split(';').map((value) => value.trim());
30 return headers.reduce((object, header, index) => {
31 object[header] = values[index];
32 return object;
33 }, {});
34 });
35
36 // Transform and prepare data for insertion
37 const transformedRecords = parsedRecords.map((item) => {
38 const [day, month, year] = item.Date.split('/').map((num) => parseInt(num, 10));
39 const [hour, minute, second] = item.Time.split(':').map((num) => parseInt(num, 10));
40 const dateObject = new Date(year, month - 1, day, hour, minute, second);
41
42 return {
43 timestamp: dateObject.toISOString(),
44 global_active_power: parseFloat(item.Global_active_power),
45 global_reactive_power: parseFloat(item.Global_reactive_power),
46 voltage: parseFloat(item.Voltage),
47 global_intensity: parseFloat(item.Global_intensity),
48 sub_metering_1: parseFloat(item.Sub_metering_1),
49 sub_metering_2: parseFloat(item.Sub_metering_2),
50 sub_metering_3: parseFloat(item.Sub_metering_3),
51 };
52 });
53
54 // Filter out invalid data
55 const finalData = transformedRecords.filter(
56 (item) =>
57 item.timestamp !== 'Invalid Date' &&
58 !isNaN(item.global_active_power) &&
59 !isNaN(item.global_reactive_power) &&
60 !isNaN(item.voltage) &&
61 !isNaN(item.global_intensity) &&
62 !isNaN(item.sub_metering_1) &&
63 !isNaN(item.sub_metering_2) &&
64 !isNaN(item.sub_metering_3)
65 );
66
67 // Insert final data into the database in chunks of 1000
68 const chunkSize = 1000;
69 for (let i = 0; i < finalData.length; i += chunkSize) {
70 const chunk = finalData.slice(i, i + chunkSize);
71 await PowerConsumptions.insertMany(chunk);
72 }
73
74 console.log('Data processing and insertion completed.');
75 } catch (error) {
76 console.error('An error occurred:', error);
77 }
78};
79
80// Call the processData function
81processData();
Before you start the script, you need to make sure that your environment variables are set up correctly. To do this, create a file named “.env” in the root folder, and add a line for “MONGODB_CONNECTION_STRING”, which is your link to the MongoDB database.
The content of the .env file should look like this:
1MONGODB_CONNECTION_STRING = 'mongodb+srv://{{username}}:{{password}}@{{your_cluster_url}}/{{your_database}}?retryWrites=true&w=majority'
For more details on constructing your connection string, refer to the official MongoDB documentation.

Visualization with MongoDB Atlas Charts

Once the data has been inserted into our MongoDB time series collection, MongoDB Atlas Charts can be used to effortlessly connect to and visualize the data.
In order to connect and use MongoDB Atlas Charts, we should:
  1. Establish a connection to the time series collection as a data source.
  2. Associate the desired fields with the appropriate X and Y axes.
  3. Implement filters as necessary to refine the data displayed.
  4. Explore the visualizations provided by Atlas Charts to gain insights.
Atlas Charts Visualization for Household Power Consumption
In the above example, we visualized the power consumption from various sources within a single day. The visualization revealed distinct usage patterns: Kitchen equipment was primarily used in the morning and evening, laundry room equipment was active around noon, and the water heater and air conditioner showed continuous use from morning to evening.
For the displayed visualization, we used a query to filter the data for a specific date:
1{ timestamp: { $gt: ISODate('2007-01-17T00:00:00.000-00:00'), $lt: ISODate('2007-01-18T00:00:00.000-00:00') } }
If you want to change what is shown in the charts, you can apply different filters or aggregation pipelines to the data, tailoring the results according to your needs.

Conclusion

This article demonstrates the powerful capabilities of MongoDB when integrated with IoT systems. By leveraging MongoDB's time series collection, we can efficiently store, manage, and analyze the large volumes of time-series data generated by IoT devices.
The case study on household power consumption not only showcases the practical applications of IoT in our daily lives but also highlights how MongoDB can help us get a deeper understanding of IoT data sets.
Through visualization with MongoDB Atlas Charts, we have gained significant insights into power consumption patterns. This not only helps in making informed decisions but also opens the door for significant improvements in energy efficiency and cost savings.
As we have explored the capabilities of MongoDB in handling IoT data and visualizing it with Atlas Charts, I hope it gets you excited to work more on your own data projects. I invite you to join the MongoDB Community Forums to share your experiences, ask questions, and collaborate with fellow enthusiasts. Whether you are seeking advice, sharing your latest project, or exploring innovative uses of MongoDB, the community is a great place to continue the conversation.
Top Comments in Forums
There are no comments on this article yet.
Start the Conversation

Facebook Icontwitter iconlinkedin icon
Rate this tutorial
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Article

Taking RAG to Production with the MongoDB Documentation AI Chatbot


Aug 29, 2024 | 11 min read
Tutorial

Optimizing for Relevance Using MongoDB Atlas and LlamaIndex


Oct 02, 2024 | 13 min read
Quickstart

Building AI and RAG Apps With MongoDB, Anyscale and PyMongo


Jul 17, 2024 | 7 min read
Article

Discover Latent Semantic Structure With Vector Clustering


Oct 11, 2024 | 10 min read
Table of Contents