MongoDB and Hadoop over relational database management systems (RDBMS)
Both MongoDB and Hadoop cover use cases that traditional relational databases simply can’t help with.
Data storage
Both MongoDB and Hadoop can store unstructured data, while traditional relational databases require you to structure and normalize data that you will store. This flexibility offered by MongoDB and Hadoop is a great advantage when your data sources can take multiple forms over time.
Thanks to the distributed file system in Hadoop, it is simple to add more nodes to a cluster to increase the available storage capacity. MongoDB uses sharding to distribute data across multiple nodes to help scale horizontally.
With traditional RDBMS, scaling needs to be done by adding more disk space, which might require downtime and usually leads to higher costs.
Data processing
Traditional RDBMSs offer limited capabilities to process the data stored in the database. Data processing is typically done through another application that queries the database and then processes it.
MongoDB offers the aggregation pipeline framework that allows you to process and manipulate data to retrieve relevant information from the database. In addition to the aggregation pipelines, you can also use Atlas Search to further refine your search requests with full-text search capabilities.
Hadoop is great at processing large batches of data using the MapReduce paradigm. This algorithm is good when individual pieces of data are processed one at a time. However, when variables need to be correlated, the MapReduce algorithm might make things slower than necessary.
Memory handling
MongoDB optimizes its memory usage to enable quick delivery of data. It keeps indexes and some data in memory to ensure predictable latency.
Hadoop, on the other hand, focuses on disk storage. It is better at optimizing disk space but will be slower to deliver query results since it needs to read from the disk.
Traditional relational databases will use a mix of both disk and memory to deliver the results as fast as possible. However, because of the need for joins, a lot of memory usage is dedicated to joining tables and might cause latency.