Spring Data Unlocked: Performance Optimization Techniques With MongoDB
Ricardo Mello5 min read • Published Dec 04, 2024 • Updated Dec 04, 2024
FULL APPLICATION
Rate this article
Ah, I know! Just like in the second article, you're here because you enjoyed the initial topic, right? To recap: In the first article, we covered how to get started with MongoDB and Spring Data, exploring the difference between MongoTemplate and MongoRepository and when to use one or the other. In the second part, we took it a step further by exploring advanced queries and learning powerful concepts that Spring offers. Finally (but not least), we’ll look at how to optimize our application through strategies like read preferences and indexes.
- Get started with MongoDB Atlas for free! If you don’t already have an account, MongoDB offers a free-forever Atlas cluster.
- IDE of your choice
Improving performance in MongoDB is crucial for ensuring fast, efficient data access and enhancing user experience. By optimizing strategies, we can reduce latency, increase throughput, and better scale applications as they grow.
You’ve probably heard countless times that indexes are essential for achieving high performance in your queries, right? However, it’s important to remember that indexes come with certain rules, and even unused indexes can negatively impact your database. In collections with a high frequency of writes compared to reads, maintaining indexes can be challenging, because each write requires updating them.
You might be wondering: “Okay, but how do I know when to create an index or not?” Well, we generally create indexes when our application frequently runs queries on the same field. In our application, let’s revisit the
TransactionRepository
class and take a look at the findByTransactionType(String type) method:1 2 public interface TransactionRepository extends MongoRepository<Transaction, String> { 3 4 List<Transaction> findByTransactionType(String type); 5 // Other methods .. 6 }
This method is responsible for retrieving transactions by type. To understand the cost of this query, we can run the explain() command:
1 db.transactions.find({"transactionType": "Debit"}).explain()
Or, in our case, we’ll run a query directly using Spring. In CustomerService, create the following method:
1 public String explain() { 2 3 MongoCollection<Document> collection = mongoOperations.getCollection("transactions"); 4 5 Document query = new Document("transactionType", "Debit"); 6 Document explanation = collection.find(query).explain(); 7 8 logger.info(explanation.toJson()); 9 return explanation.toJson(); 10 }
The
explain()
command provides the query execution plan, detailing how MongoDB processes the query. In our case, since we don’t have an index, it returns a COLLSCAN (collection scan).To optimize this, let's create an index for this query. Traditionally, we create the index as follows:
1 db.transactions.createIndex({"transactionType": 1})
However, since we are exploring Spring's options, we’ll create the index in a different way. To do this, simply navigate to the
Transaction
class (which represents our collection) and annotate the desired field with the @Indexed
annotation:1 import lombok.Data; 2 import org.springframework.data.annotation.Id; 3 import org.springframework.data.mongodb.core.index.Indexed; 4 import org.springframework.data.mongodb.core.mapping.Document; 5 6 7 8 public class Transaction { 9 10 11 private String transactionType; 12 13 // Other fields 14 }
For this to work, we need to enable auto-index-creation in the application properties by adding the following configuration:
1 spring: 2 data: 3 mongodb: 4 auto-index-creation: true
Notice: While this is a convenient way to create indexes during
development, it’s important to exercise caution in production
environments. Auto-index creation can cause the application startup to
pause while the index is being created, which might lead to delays or
availability issues if not handled carefully. For production
scenarios, consider creating indexes explicitly during a controlled
process outside the application lifecycle.
Continuing our exploration of indexes, we now have compound indexes. As the name suggests, these indexes combine more than one field for query purposes. Let’s assume we have a query that searches by both the transaction type and currency:
1 2 public interface TransactionRepository extends MongoRepository<Transaction, String> { 3 4 List<Transaction> findByTransactionTypeAndCurrency(String type, String currency); 5 6 // Other methods 7 }
To optimize this query, we could create the compound index as follows:
1 2 3 public class Transaction { 4 // fields 5 }
The
@Indexed
interface includes a unique()
method, which is set to false
by default. This means that any field annotated with @Indexed
will not enforce uniqueness by default. However, we can specify that we want the field to be unique.Imagine that in our
Transaction
class, we have a field called authCode, responsible for ensuring that each transaction is unique. We can annotate the class as follows:1 public class Transaction { 2 3 private String authCode; 4 }
This annotation ensures that the
transactions
collection will not have any duplicate authCode values. If a duplicate code is provided, MongoDB will throw the following exception:1 com.mongodb.MongoWriteException - E11000 duplicate key error collection
Let's suppose that we have a function in our application that executes a query to find all transactions older than 40 days with a status of “COMPLETED” or “SUCCESS” and runs a delete command. In this case, we would have two steps:
- Filter all transactions in this state.
- Perform a delete operation on these items.
This can be expensive for your application. An alternative is to use the TTL index, which automatically expires documents after a certain amount of time. We can tell our application:
"Hey, I want you to delete all records that were created more than 40 days ago and are in the “SUCCESS” or “COMPLETED” status"
1 public class Transaction { 2 3 4 5 6 7 8 9 10 11 12 13 private LocalDateTime createdAt; 14 15 }
In this way, the application will expire (delete) the transactions that meet this filter.
Using
ReadPreference
can improve latency and performance by directing reads to secondary replicas that are closer, reducing the load on the primary node and optimizing data distribution in geographically distributed systems.As shown in the image above, by default, all write operations occur on the primary node, while read operations can be directed to secondary nodes. There are several read preference modes, such as:
Let's go back to our
TransactionRepository
class and mark the findByTransactionType method to perform the search using the secondaryPreferred mode:1 2 public interface TransactionRepository extends MongoRepository<Transaction, String> { 3 4 List<Transaction> findByTransactionType(String type); 5 }
Another alternative is to apply the mode to the entire application through the connection settings. In
MongoConfig
, you can configure the read preference mode globally. Here's how you can do it:1 2 public class MongoConfig { 3 4 public MongoClient mongoClient() { 5 MongoClientSettings settings = MongoClientSettings.builder() 6 .applyConnectionString(new ConnectionString(connectionString)) 7 .readPreference(ReadPreference.nearest()) 8 .build(); 9 return MongoClients.create(settings); 10 }
In this third and final part of the Spring Data Unlocked series, we learned how to create indexes directly through the application, adjust read preferences with ReadPreference, and optimize the performance of our queries. The complete code for the application is available in the mongodb-developer repository.
Top Comments in Forums
There are no comments on this article yet.