MongoDB Indexing Best Practices
In MongoDB database administration, indexing is a critical factor that determines query performance. A proper indexing strategy can reduce query response times from seconds to milliseconds, while missing or poorly designed indexes can cause the database to become a system performance bottleneck, especially when data volumes grow into the millions or more.
1. Environment Setup: A Containerized MongoDB Development Environment
To conduct performance testing, a stable and isolated MongoDB environment is required. The following configuration uses Docker Compose to launch a MongoDB instance and a Mongo Express service for web-based administration.
First, create a .env
file in the project’s root directory to store sensitive configurations:
|
|
Next, create the docker-compose.yml
file:
|
|
Start the Services: Execute the following command in your terminal to start all services.
|
|
Connect to the MongoDB Shell (mongosh
):
Use the command below to connect to the running MongoDB container. Replace your-strong-root-password
with the password you set in the .env
file.
|
|
Once connected, you can proceed with data operations and performance testing.
2. Simulating a Real-World Workload: Generating a Million-Document Dataset
To effectively evaluate index performance, a large-scale dataset is necessary. The following script simulates an e-commerce or content platform scenario by generating one million documents containing fields for status, category, creation time, and score.
Execute the following script in mongosh
:
|
|
Note: When performing large-scale data imports, it’s crucial to load the data first and create indexes afterward. If indexes exist during the import process, each insert operation will trigger index updates, significantly degrading write performance.
3. From Collection Scan to Index Scan: Performance Analysis and Optimization
This section analyzes the performance of a typical multi-condition query with and without an index. The query requirement is: Find the top 10 documents with a status of ‘active’, a category of either ‘A’ or ‘B’, created within the last 90 days, and sorted by score in descending order.
3.1. Performance Baseline: The COLLSCAN
Bottleneck
Without any indexes, execute the query and use explain("executionStats")
to analyze its execution plan.
|
|
Analysis of Key Execution Plan Metrics:
|
|
stage: "COLLSCAN"
: This is the root cause of poor performance. MongoDB must iterate through every document in the collection to check if it matches the query criteria.totalDocsExamined: 1000000
: This value, equal to the total number of documents in the collection, is direct evidence of aCOLLSCAN
.executionTimeMillis
: With a million-document dataset, the execution time is typically in the range of hundreds of milliseconds to several seconds, which is an unacceptable latency for high-concurrency applications.
3.2. The ESR Rule: The Core Principle of Compound Index Design
To optimize this query, a compound index is required. The order of fields in a compound index is critical and should follow the ESR (Equality, Sort, Range) Rule:
- E (Equality): The prefix of the index should consist of fields used for equality matches (e.g.,
status: "active"
) and multi-value equality matches (e.g.,category: { $in: [...] }
). Fields with higher cardinality (selectivity) should be placed first. - S (Sort): Next, place the fields used for sorting (e.g.,
sort({ score: -1 })
). The sort direction in the index must exactly match the sort direction in the query (both ascending or both descending). - R (Range): Finally, place fields used for range queries (e.g.,
createdAt: { $gte: ... }
).
Once the query optimizer uses a range predicate on an index field, it cannot use any subsequent index fields to satisfy a sort operation. Therefore, the sort field must precede the range field. According to the ESR rule, the optimal index for this query is { status: 1, category: 1, score: -1, createdAt: 1 }
.
Create the Index:
|
|
3.3. Performance Optimization: The Significant Boost from IXSCAN
After creating the index, re-run the same query with explain()
:
|
|
Key Metrics of the Optimized Execution Plan:
|
|
stage: "IXSCAN"
: The query plan successfully switched to an index scan, indicating that MongoDB effectively utilized the newly created index.totalKeysExamined
andtotalDocsExamined
dropped dramatically. MongoDB used the index to quickly locate entries matching thestatus
andcategory
criteria and then leveraged the pre-sortedscore
field in the index to efficiently filter for the top 10 documents matching thecreatedAt
condition.executionTimeMillis
: The execution time plummeted from 850ms to 2ms, a performance improvement of over 400x.
3.4. Peak Performance: Covered Queries
Although IXSCAN
is highly efficient, MongoDB still needs to fetch the full document from disk using the pointer found in the index (a process involving a FETCH
stage). If all fields required by a query (both in the filter and the projection) are present in the index, MongoDB can return the results directly from the index in memory, without reading the documents from disk. This is known as a covered query, the ideal state for index performance.
To achieve a covered query, create an index that includes all necessary fields and use a projection to specify that only those fields should be returned.
|
|
When analyzing the explain
output, the key indicators of a covered query are:
executionStats.totalDocsExamined
is0
.- There is no
FETCH
stage in the execution plan.
Note: Interpreting Modern Query Plans
In modern versions of MongoDB, for queries involving the $in
operator, the optimizer may choose a more advanced SORT_MERGE
plan. This plan decomposes the $in
query into multiple parallel IXSCAN
s and then efficiently merges their sorted results.
In this case, the execution plan might look like LIMIT
-> PROJECTION_DEFAULT
-> SORT_MERGE
-> (multiple IXSCAN
s). This does not indicate a failure. On the contrary, it shows the optimizer found a more efficient execution path. As long as totalDocsExamined
remains 0
and there is no FETCH
stage, it is a highly efficient and successful covered query. This demonstrates that the performance benefits of covered queries persist even in complex query plans.
4. Professional Diagnostic and Management Toolkit
In complex production environments, it is vital to proactively discover, diagnose, and manage indexes.
4.1. Database Profiler: The Performance Probe
The profiler is a built-in tool for capturing slow queries by logging database operations that exceed a specified threshold.
|
|
By analyzing the documents in the system.profile
collection, you can accurately identify performance bottlenecks and their query patterns in a production environment, providing a basis for index optimization.
4.2. $indexStats
: The Index Health Checker
This aggregation stage returns usage statistics for each index in a collection.
|
|
Run this command periodically. If an index’s ops
field consistently remains 0
, it indicates that the index is not being used by any queries and can be considered for removal to reduce write overhead and storage space.
4.3. Common Index Management Commands
Mastering core index management commands is fundamental to routine maintenance.
-
Create Index (
createIndex
) Creates a new index on a collection.1 2 3
// Syntax: db.collection.createIndex( <keys>, <options> ) // Create a unique index in the background db.events.createIndex({ name: 1 }, { unique: true, background: true });
The
background: true
option is crucial as it allows the index to be built in the background, avoiding blocking other database operations in a production environment. -
View Indexes (
getIndexes
) Lists all indexes on a collection along with their specifications.1
db.events.getIndexes();
This is the first step to review existing index configurations and get index names for drop or hide operations.
-
Drop Index (
dropIndex
) Removes an index that is no longer needed. You can specify the index to drop by its name or its key pattern.1 2 3 4 5
// Drop by name (recommended) db.events.dropIndex("status_1_category_1_score_-1_createdAt_1"); // Drop by key pattern db.events.dropIndex({ status: 1, category: 1, score: -1, createdAt: 1 });
-
Hide and Unhide Index (
hideIndex
/unhideIndex
) If you are uncertain about dropping an index, you can first hide it. A hidden index is invisible to the query optimizer, allowing you to safely evaluate the impact of its removal.1 2 3 4 5 6 7
// Hide an index db.events.hideIndex("status_1_category_1_score_-1_createdAt_1_name_1"); // ... Perform performance testing during this period ... // Unhide the index db.events.unhideIndex("status_1_category_1_score_-1_createdAt_1_name_1");
4.4. A Systematic Optimization Workflow
- Capture: Use the Profiler to discover slow queries.
- Analyze: Use
explain()
to reproduce the issue and confirm the presence ofCOLLSCAN
or inefficientIXSCAN
stages. - Design: Design or adjust indexes according to the ESR rule.
- Validate: Use
explain()
again to verify that the query plan has been optimized and confirm that metrics likeexecutionTimeMillis
andtotalKeysExamined
have significantly improved. - Monitor: Continuously monitor index usage with
$indexStats
and periodically clean up unused indexes usingdropIndex
orhideIndex
.
5. Overview of Other Index Types
Besides compound indexes, MongoDB offers several specialized index types to meet different requirements.
- Text Indexes (
"text"
): Designed for full-text search, supporting language-specific stemming and stop words. - Geospatial Indexes (
"2dsphere"
): Used for querying geospatial data, supporting queries for inclusion, intersection, and proximity on points, lines, and polygons. - Partial Indexes: Only index the documents in a collection that meet a specified filter expression. For example, you could index only documents where
status
is"archived"
, significantly reducing the index size and maintenance overhead.
6. Indexing Best Practices Checklist
- Design Indexes for Queries: Indexes should be created to support the most frequent and performance-critical queries of your application, rather than blindly creating single-field indexes for every field in a collection.
- Strictly Adhere to the ESR Rule: For compound indexes, the Equality -> Sort -> Range field order is key to ensuring optimal performance. Place the most selective equality filter fields at the beginning of the index.
- Use
explain()
as a Core Tool: Before and after creating or modifying an index, always useexplain()
to verify that the query plan meets expectations and that the query can efficiently utilize the index. - Prioritize Covered Queries: Whenever possible, design indexes to include all fields required by a query’s filter and projection to avoid disk I/O and achieve the best possible query performance.
- Understand the Cost of Indexes: Every index consumes storage space and memory and adds overhead to write operations (
insert
,update
,delete
). Only create necessary indexes and periodically clean up unused ones with$indexStats
. - Monitor and Iterate Continuously: Database performance is dynamic. Use tools like the Profiler to monitor performance continuously and adjust your indexing strategy as business requirements and query patterns evolve.