Use case for storing pages of text like articles as key:value pairs

Stennie_X · June 17, 2020, 7:49am

Welcome to the community @anjanesh!

MongoDB stores structured data in documents. Key/value is an extremely simplified view as values can include more complex types like embedded documents and arrays.

You may choose to store an article or large text blog as a single value, but typically this is not the best approach if you also want to provide a search interface. For example, you would normally want to distinguish title, author, and other metadata from the body of an article. For efficient searching, you also want to consider how to index and prioritise different aspects of your content.

For a great introduction to MongoDB data patterns, I suggest reviewing Building with Patterns: A Summary and taking the free online course M320: Data Modelling at MongoDB University. The latest session of M320 just started this week and you have until August 18 to complete the course.

Search speed depends on several factors including how you’ve modelled your data, what sort of searches you are trying to perform, and the resources of your deployment. For example, if you are trying to perform case-insensitive regular expression matches against large text blobs, performance is unlikely to be acceptable because this will be a resource-intensive scan through all of your documents.

If you have basic text search requirements, MongoDB has a standard Text Search feature which is analogous to a MySQL FULLTEXT index.

If you have more complex text search requirements, definitely look into using Atlas Search which is available for MongoDB 4.2+ Atlas clusters.

If you need suggestions for improving search performance or your data model, I suggest starting a new topic with an example of your documents, indexes, and typical search queries. Please provide specific details and examples in order to get relevant advice.

All modern versions of MongoDB compress data and indexes by default. Storage compression was optional in MongoDB 3.0, but available if you changed the storage engine to WiredTiger (which has been the default storage engine since 3.2).

The limit of 16MB per document represents a significant amount of text. For example, this is about three times as much as The Complete Works of William Shakespeare in text format (ref: Project Gutenberg). If your document sizes are approaching 16MB I would give careful consideration to whether there is a more efficient schema design for your use case.

Atlas Search integrates Apache Lucene, which is the same same search library that Elastic builds on. Atlas Search has been in beta for the last year, but is now officially Generally Available (GA) as of early June.

Regards,
Stennie