AnnouncementIntroducing MongoDB 8.0, the fastest MongoDB ever! Read more >>Introducing MongoDB 8.0, the fastest MongoDB ever! >>

How to Monitor MongoDB

Get Started with Atlas

MongoDB provides monitoring commands and tools to enhance database performance and check the health of your database instances. Read on to understand the metrics and tools you can use to monitor your clusters.

In what follows, we’ll guide you through:

Why monitor MongoDB?

A key aspect of database administration and capacity planning for your application is monitoring your cluster's health and performance. An unhealthy database may experience slow response times or become overwhelmed and impact the uptime of your application. While MongoDB Atlas, our fully managed cloud database, handles a vast majority of administration efforts and has built-in fault tolerance and scaling abilities, it’s still crucial that users know how to best monitor their clusters.

Monitoring MongoDB databases allows you to improve the performance of your application stack and optimize for costs by enabling you to:

  • Understand the current capacity of your database.
  • Observe how utilized resources are.
  • Observe the presence of abnormal behavior and performance issues.
  • Detect and react to real-time issues.
  • Comply with your SLA and data protection/governance requirements.
Top 7 areas to monitor in MongoDB

These are seven key monitoring metrics and capabilities to leverage in MongoDB.

1. Scan and order
What is scan and order?

The scan and order metric is the average rate of in-memory sort operations performed by the database per second.

Why is it important?

In-memory sorts can be very expensive as they require large result sets to be buffered. They can sometimes be avoided by using compound indexes that presort.

What should I look out for?

The ideal state to look for with scan and order is a value of 0, meaning that the database didn’t perform any in-memory sort operations.

Generally, any spike in scan and order is a red flag. This indicates that the database performed many in-memory sort operations, which adds memory and computational overhead to a query. This is a blocking stage for aggregation—further processing cannot happen until the results have been sorted.

Remember, not all queries are equal: one large scan and order can be worse than many small ones.

Graph databases example.
2. Query targeting
What is query targeting?

The query targeting metric is the ratio of the number of index items scanned to the number of documents returned by queries since the previous data point for the selected sample period.

Why is it important?

This metric is the best measure on how efficiently the database is running.

What should I look out for?

Ideally, query targeting should stay as close to 1 as possible. A value of 1 means all documents returned exactly match query criteria for the sample period. That being said, a query targeting value of up to 20 is acceptable.

Spikes in query targeting typically arise when there are no appropriate indexes to support queries. For example, a value of 100 means on average for the sample period, a query scans 100 documents to find one that's returned.

You can set an alert for this metric under Project Alerts in your settings.

illustration of graphs
3. Normalized System CPU
What is normalized system CPU?

The normalized system CPU metric shows the CPU usage of all processes on the node, scaled to a range of 0-100% by dividing them by the number of CPU cores.

Why is it important?

Operating on the improper tier could result in higher costs if the tier is overprovisioned or potential downtime due to a lack of resources if the tier is underprovisioned. Selecting the right tier will ensure optimal performance at the lowest cost.

What should I look out for?

A healthy range for the Normalized System CPU is between 40% and 70%.

Under 40% indicates potential overprovisioning, while over 70% indicates potential underprovisioning.

Overprovisioned signal <40%
Underprovisioned signal >70%
4. Performance Advisor
What is the Performance Advisor?

Performance Advisor is a tool that provides targeted insights and recommendations based on the analysis of query patterns and resource usage across the entire database cluster.

Why is it important?

By improving resource efficiency and enabling proactive monitoring, Performance Advisor supports better application performance and long-term scalability, leading to a more cost-effective system.

What should I look out for?

Performance Advisor ranks suggested indexes according to their impact, which indicates high or medium based on the total wasted bytes read.

Each suggestion contains the following metrics, which apply specifically to the queries which would be improved by the index:

  • Execution Count
  • Average Execution Time
  • Average Query Targeting
  • Average Docs Scanned
  • Average Docs Returned

Performance Advisor also shows each executed sample query that matches the query shape, with specific metrics for that query.

Please note: always verify the index recommendations before creating. Additional indexes incur write overhead and storage space. Hide indexes before dropping them.

illustration of index recommendations
5. Namespace Insights
What is Namespace Insights?

Namespace Insights tracks collection-level query latency in MongoDB Atlas, offering visibility into latency metrics and statistics for specific hosts and operation types (all operation types, reads, writes, and commands). Users can manage pinned namespaces and select up to five to display in the query latency charts.

The available metrics include:

  • Total latency
  • Average latency
  • P50 latency (50th percentile in the latency histogram)
  • P95 latency (95th percentile in the latency histogram)
  • P99 latency (99th percentile in the latency histogram)
  • Operation count
Why is it important?

Namespace Insights is crucial for identifying performance bottlenecks at the collection level. By tracking query latency metrics, database administrators can optimize query performance, enhance resource allocation, and ensure efficient data access, ultimately improving overall application performance.

What should I look out for?

In Namespace Insights, watch for high query latency over 100 milliseconds and pinned namespace performance with P95 latency exceeding 200 milliseconds. These indicators can guide decisions on query optimization and resource allocation.

illustration of graphs
6. Query Profiler
What is the Query Profiler?

Query Profiler identifies slow queries based on log data. Atlas displays this data in the Profiler section of an instance. Query Profiler captures up to the most recent 10,000 operations, or 10MB of logs.

There are seven views of Query Profiler:

  • Operation Execution Time (most common)
  • Keys Examined
  • Docs Returned
  • Examined:Returned Ratio
  • Docs Examined
  • Num Yields
  • Response Length
Why is it important?

Query Profiler provides detailed insights into query performance, helping identify inefficiencies and bottlenecks. By analyzing execution times and resource usage, it empowers database administrators to optimize queries and improve overall system performance.

What should I look out for?

Issues in Query Profiler can be indicated by high execution times, excessive document examination, and poor index usage. High resource consumption and frequent query plan changes can also signal inefficiencies that require attention for optimal performance.

illustration of graphs

The Query Profiler dashboard provides a high-level view that makes it easy to quickly identify outliers and general trends. The table offers operation statistics by namespace (database and collection) and operation type.

7. Cost Explorer
What is Cost Explorer?

Cost Explorer enables users to track and analyze their MongoDB Atlas spending. It provides insights into resource usage and associated costs, displaying metrics such as total spend, resource consumption by cluster, and cost trends over time.

Why is it important?

Cost Explorer is essential for managing MongoDB Atlas expenses effectively. By understanding where costs are incurred, users can optimize resource usage, identify potential savings, and ensure that database operations remain within budget, supporting financial planning.

What should I look out for?

When using Cost Explorer, pay attention to spikes in spending, especially during high resource usage periods. Monitoring cost trends can help identify inefficient resource allocations or underutilized clusters, revealing opportunities for optimization and better alignment with usage patterns.

Monitoring cost trends
Additional important metrics

Below, we’ve identified a number of important metrics used to measure performance, sorted by instance status and health, cluster operation and connection metrics, instance hardware metrics, and replication metrics.

Instance status and health

The status of a MongoDB server process can be an immediate indication of whether we need to drill down into its activity or health. A process that is unresponsive or does not answer to our commands should be immediately investigated.

Cluster operation and connection metrics

When your application is struggling or underperforming, you need to rule out the database layer as the bottleneck. The application issues connections and operations against the database, so pay close attention to its behavior.

MongoDB provides various metrics and mechanisms to identify its connection and operation patterns. On top of the active and proactive monitoring tools, Atlas provides a full alerting system and log gathering.

  • Monitor with MongoDB Atlas: Atlas provides built-in features like Performance Advisor, Real-Time Performance Panel, Namespace Insights, and Query Profiler to track operations and highlight slow/heavy spotted operations. Additionally, the Metrics tab provides many graphs that plot operations and number of connections. See below for more details:
MetricDefinitionImportanceSignals
Opcounters Tracks the number of operations performed by the database, including inserts, updates, deletes, and queries. Provides insight into the overall workload and performance of the database, helping to identify bottlenecks or performance issues. Good: Steady or increasing counts.

Bad: Sudden drops or stagnation may suggest issues with the database or application, such as connection problems or resource constraints.
Operation execution time Average time taken to execute database operations, measured in milliseconds. Offers a critical indicator of performance–longer execution times can lead to slower application response. Good: Low and stable execution times.

Bad: Increasing execution times may signal performance degradation, potentially due to resource contention or inefficient queries.
Query executors The number of active query executors currently processing queries in the database. Helps assess the database's ability to handle concurrent queries and overall query performance. Good: A balanced number of executors.

Bad: High numbers may indicate contention or resource exhaustion, while low numbers could suggest underutilization or query bottlenecks.
Query targeting Ratio of the number of index items scanned to the number of documents returned by queries, since the previous data point for the selected sample period. Best measure on how efficiently the database is running. Good: For frequently run queries, aim for as low a value as possible. For example, a query targeting ratio of 1 is the most efficient query as it returns 1 document for each document it scans.

Bad: Spikes arise when there are no appropriate indexes to support queries.
Connections Total number of active connections to the database at any given time. Understand the load on the database and ensure it can handle the required number of concurrent users. Good: Connection counts within expected limits.

Bad: High connection counts may lead to resource exhaustion, while very low counts could suggest underutilization or application issues.
Queues Number of operations waiting to be processed by the database, indicating the level of demand versus capacity. Identify potential bottlenecks and ensure the database can handle incoming requests efficiently. Good: Low or no queues indicate that the database is processing requests promptly.

Bad: High queue lengths suggest that the database is overwhelmed, leading to increased latency and potential timeouts.
Scan and order Average rate of in-memory sort operations performed by the database per second. In-memory sorts can be very expensive as they require large result sets to be buffered. Good: 0, database didn't perform any in-memory sort operations.

Bad: a large value indicates the database performed many in-memory sort operations.

  • Monitor with self-managed MongoDB instances:
    • You can leverage tools like mongostat and mongotop.
    • Once you connect via Compass to your instance, you can use the MongoDB Compass Performance Tab, which is similar to Atlas RealTime Performance panel.
Instance hardware metrics

Hardware metrics can be used to identify which resources could be the root cause for performance issues or which need tuning and capacity re-planning.

  • Monitor with MongoDB Atlas:
    • The Atlas metrics tab within a cluster provides plotted graphs for the hardware metrics. These allow you to correlate them with other database metrics. See below for more details:
MetricDefinitionImportanceSignals
Normalized system CPU The CPU usage of all processes on the node, scaled to a range of 0-100% by dividing by the number of CPU cores. Helps determine if the correct cluster tier is in use. An improper tier can lead to higher costs if overprovisioned or potential downtime if underprovisioned. Good: A healthy range is between 40% and 70%.

Bad: Under 40% indicates potential overprovisioning, while over 70% indicates potential underprovisioning.
Normalized process CPU The percentage of CPU resources utilized by the database process, normalized to account for the number of CPU cores available. Indicates how efficiently the database is using CPU resources, helping to identify potential performance bottlenecks. Good: Values around 50-70%.

Bad: Values consistently above 80% may indicate CPU contention, while very low values could suggest underutilization.
Disk latency Average time taken for read and write operations on the disk, measured in milliseconds. Critical measure of disk performance–high latency can significantly impact database performance and user experience. Good: Low latency values (typically under 5ms).

Bad: High latency (over 20ms) can signal disk bottlenecks or issues with the underlying storage infrastructure.
Disk IOPS Number of input/output operations per second that the disk can handle. Clarifies the disk's ability to support database workloads, especially for read/write-heavy applications. Good: High IOPS values (above 500 IOPS).

Bad: Low IOPS (below 100 IOPS) may suggest that the disk is a bottleneck, potentially leading to performance degradation.
Disk space free Amount of available disk space on the storage system used by the database. Ensures that there is sufficient space for data growth, backups, and operational efficiency. Good: Above 20% of total capacity.

Bad: Below 10% capacity can lead to performance issues, data loss, or inability to perform necessary operations.
System memory Total amount of RAM being used by the database process compared to the total available system memory. Critical for performance–adequate memory usage can reduce disk I/O and improve query response times. Good: Moderate memory usage, around 60-80%.

Bad: High memory usage (over 90%) may indicate potential memory pressure, while very low usage could suggest underutilization.
Swap usage Amount of disk space being used as virtual memory when the system runs out of physical RAM. High swap usage can indicate insufficient memory, leading to performance degradation as the system relies on slower disk storage. Good: Low swap usage, ideally 0-5%.

Bad: High swap usage (over 10%) can lead to significant performance issues, suggesting a need for additional memory resources.

  • Monitor with General MongoDB instances:
    • Use your operating system tools (top, iostat, etc.).
Replication metrics

Replication is a key aspect of MongoDB clusters' high availability and durability. The health and performance of replication needs to be carefully monitored in order to maintain a healthy cluster.

  • Monitor with MongoDB Atlas: The Atlas metrics tab within a cluster provides plotted graphs for the replication metrics. Replication metrics allow you to correlate them with other database metrics. See below for more details:
MetricDefinitionImportanceSignals
Replication lag Time delay between the primary and secondary nodes in a replica set, measured in seconds. Indicates how current the secondary nodes are compared to the primary, affecting data consistency and availability. Good: Low lag, typically under 5 seconds.

Bad: High lag (over 10 seconds) can lead to stale reads and potential data loss during failover.
Replication oplog window Time span of data that can be replayed from the oplog on the primary node, measured in seconds. Ensures that secondaries can catch up with the primary; a short window may lead to data loss if a secondary falls too far behind. Good: A longer oplog window of several hours.

Bad: A short window (under 1 hour) can risk data loss if a secondary is unable to catch up.
Replication headroom Amount of time the oplog can sustain the secondary nodes without new data being written to the primary. Provides insight into how long secondaries can remain disconnected without falling behind. Good: Ample headroom of several hours.

Bad: Limited headroom (under 30 minutes) suggests a risk of data loss if the primary becomes unavailable.
Oplog GB/hour Amount of data written to the oplog per hour, measured in gigabytes. Assesses the write load on the primary and the capacity of the oplog to handle data changes. Good: Moderate values of 1-5 GB/hour.

Bad: High values of over 10 GB/hour may suggest that the oplog could fill up quickly, risking data loss for secondaries.
Opcounters – repl Counters that track the number of replication operations (inserts, updates, deletes) performed by the database. Provides insight into the replication workload and helps identify potential bottlenecks in the replication process. Good: Steady or increasing counts.

Bad: Sudden drops or stagnation may suggest issues with replication, such as network problems or resource constraints.

  • Monitor with General MongoDB instances: Use the usage of the following MongoDB Commands:
MongoDB performance monitoring tools

MongoDB provides built-in UI tools in Atlas as well as Cloud Manager and Ops Manager to help you monitor performance. MongoDB also offers some standalone tools and commands to look at more raw-based data.

Below are tools you can run from a host, which has access and appropriate roles (clusterMonitor) to monitor your environment.

mongostat

mongostat is used to get a quick overview of the status of your MongoDB server instance. It’s best used for watching a single instance for a specific event as it provides a real-time view. You can use this command to monitor basic server statistics such as operation breakdown, MongoDB memory statistics, lock queues, and connections/network.

You can execute the MongoDB command through the following syntax:

See example output here.

mongotop

mongotop tracks the amount of time a MongoDB instance spends reading and writing data per collection.

You can execute the MongoDB command through the following syntax:

See example output here.

rs.status()

rs.status() returns the replica set status. It is done from the point of view of the member where the method is run.

See example output here.

db.serverStatus()

db.serverStatus() provides a document representing the current instance metrics counters. Run this command at a regular interval to collect statistics about the instance.

See example output here.

dbStats

dbStats command returns the storage statistics, such as the total collection data versus storage size, number of indexes and their size, and collection-related statistics (number of documents and collections), for a certain database.

See example output here.

collStats

collStats command is used to collect statistics similar to that provided by dbStats on the collection level. Its output includes a count of the objects in the collection, the collection’s size, the amount of disk space consumed by the collection, and information concerning its indexes for a given collection.

See example output here.

We can monitor MongoDB databases by using different tools like mongostat, mongotop, dbStats, collStats, and serverStatus commands. These commands provide real-time monitoring and reporting of the database server, allowing us to monitor errors and database performance and assist in informed decision making to optimize a database.

Summary

MongoDB provides a variety of metrics and tools to monitor your database and ensure it's running at optimal performance. From UI tools to advisors to raw-data metrics, you're covered whether you're hosting your database yourself or using MongoDB Atlas.

For more information on monitoring MongoDB databases, see the following resources.

References:

MongoDB Atlas Monitoring
MongoDB Performance
MongoDB Performance Best Practices
MongoDB Professional Services

Appendix

Cloud Manager
Cluster view
Compass Performance Tab
Connections
Cost Explorer
db.serverStatus()
dbStats
Disk IOPS
Disk latency
Disk space free
collStats
Metrics tab
mongostat
mongotop
Namespace Insights
Normalized process CPU
Normalized system CPU
Oplog GB/hour
Opcounters
Opcounters – repl
Operation execution time
Ops Manager
Performance Advisor
Queues
Query executors
Query Profiler
Query targeting
Real-Time Performance Panel
Replication headroom
Replication lag
Replication oplog window
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()
rs.status()
Scan and order
Swap usage
System memory

Follow this tutorial with MongoDB Atlas

Experience the benefits of using MongoDB, the premier NoSQL database, on the cloud.
Get Started Free!