Contents

MongoDB Indexing Best Practices

In MongoDB database administration, indexing is a critical factor that determines query performance. A proper indexing strategy can reduce query response times from seconds to milliseconds, while missing or poorly designed indexes can cause the database to become a system performance bottleneck, especially when data volumes grow into the millions or more.

1. Environment Setup: A Containerized MongoDB Development Environment

To conduct performance testing, a stable and isolated MongoDB environment is required. The following configuration uses Docker Compose to launch a MongoDB instance and a Mongo Express service for web-based administration.

First, create a .env file in the project’s root directory to store sensitive configurations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# MongoDB Configuration
MONGO_ROOT_USERNAME=root
MONGO_ROOT_PASSWORD=your-strong-root-password
MONGO_PORT=27017

# Mongo Express Configuration
MONGO_EXPRESS_PORT=8081
MONGO_EXPRESS_BASICAUTH_ENABLED=true
MONGO_EXPRESS_USERNAME=mongoexpressuser
MONGO_EXPRESS_PASSWORD=your-express-password

Next, create the docker-compose.yml file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
version: "3.9"

services:
  mongo:
    image: mongo:8.0.13  # Using a recent stable version
    container_name: mongodb_indexing_practice
    restart: unless-stopped
    environment:
      MONGO_INITDB_ROOT_USERNAME: ${MONGO_ROOT_USERNAME:-root}
      MONGO_INITDB_ROOT_PASSWORD: ${MONGO_ROOT_PASSWORD:-example}
    ports:
      - "${MONGO_PORT:-27017}:27017"
    volumes:
      - mongodb_data:/data/db
    networks:
      - mongodb-network
    healthcheck:
      test: ["CMD", "mongosh", "--quiet", "--eval", "db.adminCommand('ping')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  mongo-express:
    image: mongo-express
    container_name: mongo-express
    restart: unless-stopped
    ports:
      - "${MONGO_EXPRESS_PORT:-8081}:8081"
    environment:
      ME_CONFIG_MONGODB_URL: mongodb://${MONGO_ROOT_USERNAME:-root}:${MONGO_ROOT_PASSWORD:-example}@mongo:27017/
      ME_CONFIG_BASICAUTH_ENABLED: ${MONGO_EXPRESS_BASICAUTH_ENABLED:-true}
      ME_CONFIG_BASICAUTH_USERNAME: ${MONGO_EXPRESS_USERNAME:-mongoexpressuser}
      ME_CONFIG_BASICAUTH_PASSWORD: ${MONGO_EXPRESS_PASSWORD:-mongoexpresspass}
    networks:
      - mongodb-network
    depends_on:
      mongo:
        condition: service_healthy

volumes:
  mongodb_data:
    driver: local

networks:
  mongodb-network:
    driver: bridge

Start the Services: Execute the following command in your terminal to start all services.

1
docker compose up -d

Connect to the MongoDB Shell (mongosh): Use the command below to connect to the running MongoDB container. Replace your-strong-root-password with the password you set in the .env file.

1
docker exec -it mongodb_indexing_practice mongosh -u root -p 'your-strong-root-password'

Once connected, you can proceed with data operations and performance testing.


2. Simulating a Real-World Workload: Generating a Million-Document Dataset

To effectively evaluate index performance, a large-scale dataset is necessary. The following script simulates an e-commerce or content platform scenario by generating one million documents containing fields for status, category, creation time, and score.

Execute the following script in mongosh:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
use benchdb;
db.events.drop();

const statuses = ["active", "inactive", "pending"];
const categories = ["A", "B", "C", "D", "E", "F", "G", "H"];
const total = 1_000_000;
const batchSize = 5000;
console.log(`Generating ${total} test documents...`);

for (let i = 0; i < total; i += batchSize) {
    let bulkOps = [];
    for (let j = 0; j < batchSize; j++) {
        const docIndex = i + j;
        if (docIndex >= total) break;

        bulkOps.push({
            insertOne: {
                document: {
                    status: statuses[docIndex % statuses.length],
                    category: categories[docIndex % categories.length],
                    createdAt: new Date(Date.now() - Math.floor(Math.random() * 365 * 86400000)), // Random time within the last year
                    score: NumberInt(Math.floor(Math.random() * 100000)),
                    name: `Product-${docIndex}`,
                    payload: { data: "..." } // Simulate other business data
                }
            }
        });
    }
    if (bulkOps.length > 0) {
        db.events.bulkWrite(bulkOps, { ordered: false });
    }
    if ((i + batchSize) % 100000 === 0) {
        console.log(`Inserted ${Math.min(i + batchSize, total)} documents...`);
    }
}

console.log(`Data generation complete! Total documents in collection: ${db.events.countDocuments()}`);

Note: When performing large-scale data imports, it’s crucial to load the data first and create indexes afterward. If indexes exist during the import process, each insert operation will trigger index updates, significantly degrading write performance.


3. From Collection Scan to Index Scan: Performance Analysis and Optimization

This section analyzes the performance of a typical multi-condition query with and without an index. The query requirement is: Find the top 10 documents with a status of ‘active’, a category of either ‘A’ or ‘B’, created within the last 90 days, and sorted by score in descending order.

3.1. Performance Baseline: The COLLSCAN Bottleneck

Without any indexes, execute the query and use explain("executionStats") to analyze its execution plan.

1
2
3
4
5
6
7
const ninetyDaysAgo = new Date(Date.now() - 90 * 86400000);

db.events.find({
    status: "active",
    category: { $in: ["A", "B"] },
    createdAt: { $gte: ninetyDaysAgo }
}).sort({ score: -1 }).limit(10).explain("executionStats");

Analysis of Key Execution Plan Metrics:

1
2
3
4
5
6
7
8
"executionStats": {
    "executionSuccess": true,
    "nReturned": 10,
    "executionTimeMillis": 850, // Execution time may vary based on hardware
    "totalKeysExamined": 0,
    "totalDocsExamined": 1000000, // Scanned all documents in the collection
    "stage": "COLLSCAN" // The execution stage is a collection scan
}
  • stage: "COLLSCAN": This is the root cause of poor performance. MongoDB must iterate through every document in the collection to check if it matches the query criteria.
  • totalDocsExamined: 1000000: This value, equal to the total number of documents in the collection, is direct evidence of a COLLSCAN.
  • executionTimeMillis: With a million-document dataset, the execution time is typically in the range of hundreds of milliseconds to several seconds, which is an unacceptable latency for high-concurrency applications.

3.2. The ESR Rule: The Core Principle of Compound Index Design

To optimize this query, a compound index is required. The order of fields in a compound index is critical and should follow the ESR (Equality, Sort, Range) Rule:

  1. E (Equality): The prefix of the index should consist of fields used for equality matches (e.g., status: "active") and multi-value equality matches (e.g., category: { $in: [...] }). Fields with higher cardinality (selectivity) should be placed first.
  2. S (Sort): Next, place the fields used for sorting (e.g., sort({ score: -1 })). The sort direction in the index must exactly match the sort direction in the query (both ascending or both descending).
  3. R (Range): Finally, place fields used for range queries (e.g., createdAt: { $gte: ... }).

Once the query optimizer uses a range predicate on an index field, it cannot use any subsequent index fields to satisfy a sort operation. Therefore, the sort field must precede the range field. According to the ESR rule, the optimal index for this query is { status: 1, category: 1, score: -1, createdAt: 1 }.

Create the Index:

1
db.events.createIndex({ status: 1, category: 1, score: -1, createdAt: 1 });

3.3. Performance Optimization: The Significant Boost from IXSCAN

After creating the index, re-run the same query with explain():

1
2
3
4
5
db.events.find({
    status: "active",
    category: { $in: ["A", "B"] },
    createdAt: { $gte: ninetyDaysAgo }
}).sort({ score: -1 }).limit(10).explain("executionStats");

Key Metrics of the Optimized Execution Plan:

1
2
3
4
5
6
7
8
"executionStats": {
    "executionSuccess": true,
    "nReturned": 10,
    "executionTimeMillis": 2, // Execution time drastically reduced
    "totalKeysExamined": 2500, // Only a small number of index keys were examined
    "totalDocsExamined": 2500, // Only a small number of documents were loaded
    "stage": "IXSCAN" // The execution stage is an index scan
}
  • stage: "IXSCAN": The query plan successfully switched to an index scan, indicating that MongoDB effectively utilized the newly created index.
  • totalKeysExamined and totalDocsExamined dropped dramatically. MongoDB used the index to quickly locate entries matching the status and category criteria and then leveraged the pre-sorted score field in the index to efficiently filter for the top 10 documents matching the createdAt condition.
  • executionTimeMillis: The execution time plummeted from 850ms to 2ms, a performance improvement of over 400x.

3.4. Peak Performance: Covered Queries

Although IXSCAN is highly efficient, MongoDB still needs to fetch the full document from disk using the pointer found in the index (a process involving a FETCH stage). If all fields required by a query (both in the filter and the projection) are present in the index, MongoDB can return the results directly from the index in memory, without reading the documents from disk. This is known as a covered query, the ideal state for index performance.

To achieve a covered query, create an index that includes all necessary fields and use a projection to specify that only those fields should be returned.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// Create a covering index that includes the 'name' field
db.events.createIndex({ status: 1, category: 1, score: -1, createdAt: 1, name: 1 });

// Execute a query that only requests fields within the index
db.events.find(
    {
        status: "active",
        category: { $in: ["A", "B"] },
        createdAt: { $gte: ninetyDaysAgo }
    },
    { _id: 0, status: 1, category: 1, score: 1, createdAt: 1, name: 1 } // Projection
).sort({ score: -1 }).limit(10).explain("executionStats");

When analyzing the explain output, the key indicators of a covered query are:

  1. executionStats.totalDocsExamined is 0.
  2. There is no FETCH stage in the execution plan.

Note: Interpreting Modern Query Plans

In modern versions of MongoDB, for queries involving the $in operator, the optimizer may choose a more advanced SORT_MERGE plan. This plan decomposes the $in query into multiple parallel IXSCANs and then efficiently merges their sorted results.

In this case, the execution plan might look like LIMIT -> PROJECTION_DEFAULT -> SORT_MERGE -> (multiple IXSCANs). This does not indicate a failure. On the contrary, it shows the optimizer found a more efficient execution path. As long as totalDocsExamined remains 0 and there is no FETCH stage, it is a highly efficient and successful covered query. This demonstrates that the performance benefits of covered queries persist even in complex query plans.


4. Professional Diagnostic and Management Toolkit

In complex production environments, it is vital to proactively discover, diagnose, and manage indexes.

4.1. Database Profiler: The Performance Probe

The profiler is a built-in tool for capturing slow queries by logging database operations that exceed a specified threshold.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// Enable the profiler to log all operations that take longer than 30ms
db.setProfilingLevel(1, { slowms: 30 });

// ... Run the application to generate a query load ...

// Query the slow query log, sorted by timestamp descending
db.system.profile.find().sort({ ts: -1 }).limit(5).pretty();

// Disable the profiler
db.setProfilingLevel(0);

// Check the current profiler status
db.getProfilingStatus();

By analyzing the documents in the system.profile collection, you can accurately identify performance bottlenecks and their query patterns in a production environment, providing a basis for index optimization.

4.2. $indexStats: The Index Health Checker

This aggregation stage returns usage statistics for each index in a collection.

1
db.events.aggregate([{ $indexStats: {} }]).pretty();

Run this command periodically. If an index’s ops field consistently remains 0, it indicates that the index is not being used by any queries and can be considered for removal to reduce write overhead and storage space.

4.3. Common Index Management Commands

Mastering core index management commands is fundamental to routine maintenance.

  • Create Index (createIndex) Creates a new index on a collection.

    1
    2
    3
    
    // Syntax: db.collection.createIndex( <keys>, <options> )
    // Create a unique index in the background
    db.events.createIndex({ name: 1 }, { unique: true, background: true });
    

    The background: true option is crucial as it allows the index to be built in the background, avoiding blocking other database operations in a production environment.

  • View Indexes (getIndexes) Lists all indexes on a collection along with their specifications.

    1
    
    db.events.getIndexes();
    

    This is the first step to review existing index configurations and get index names for drop or hide operations.

  • Drop Index (dropIndex) Removes an index that is no longer needed. You can specify the index to drop by its name or its key pattern.

    1
    2
    3
    4
    5
    
    // Drop by name (recommended)
    db.events.dropIndex("status_1_category_1_score_-1_createdAt_1");
    
    // Drop by key pattern
    db.events.dropIndex({ status: 1, category: 1, score: -1, createdAt: 1 });
    
  • Hide and Unhide Index (hideIndex / unhideIndex) If you are uncertain about dropping an index, you can first hide it. A hidden index is invisible to the query optimizer, allowing you to safely evaluate the impact of its removal.

    1
    2
    3
    4
    5
    6
    7
    
    // Hide an index
    db.events.hideIndex("status_1_category_1_score_-1_createdAt_1_name_1");
    
    // ... Perform performance testing during this period ...
    
    // Unhide the index
    db.events.unhideIndex("status_1_category_1_score_-1_createdAt_1_name_1");
    

4.4. A Systematic Optimization Workflow

  1. Capture: Use the Profiler to discover slow queries.
  2. Analyze: Use explain() to reproduce the issue and confirm the presence of COLLSCAN or inefficient IXSCAN stages.
  3. Design: Design or adjust indexes according to the ESR rule.
  4. Validate: Use explain() again to verify that the query plan has been optimized and confirm that metrics like executionTimeMillis and totalKeysExamined have significantly improved.
  5. Monitor: Continuously monitor index usage with $indexStats and periodically clean up unused indexes using dropIndex or hideIndex.

5. Overview of Other Index Types

Besides compound indexes, MongoDB offers several specialized index types to meet different requirements.

  • Text Indexes ("text"): Designed for full-text search, supporting language-specific stemming and stop words.
  • Geospatial Indexes ("2dsphere"): Used for querying geospatial data, supporting queries for inclusion, intersection, and proximity on points, lines, and polygons.
  • Partial Indexes: Only index the documents in a collection that meet a specified filter expression. For example, you could index only documents where status is "archived", significantly reducing the index size and maintenance overhead.

6. Indexing Best Practices Checklist

  1. Design Indexes for Queries: Indexes should be created to support the most frequent and performance-critical queries of your application, rather than blindly creating single-field indexes for every field in a collection.
  2. Strictly Adhere to the ESR Rule: For compound indexes, the Equality -> Sort -> Range field order is key to ensuring optimal performance. Place the most selective equality filter fields at the beginning of the index.
  3. Use explain() as a Core Tool: Before and after creating or modifying an index, always use explain() to verify that the query plan meets expectations and that the query can efficiently utilize the index.
  4. Prioritize Covered Queries: Whenever possible, design indexes to include all fields required by a query’s filter and projection to avoid disk I/O and achieve the best possible query performance.
  5. Understand the Cost of Indexes: Every index consumes storage space and memory and adds overhead to write operations (insert, update, delete). Only create necessary indexes and periodically clean up unused ones with $indexStats.
  6. Monitor and Iterate Continuously: Database performance is dynamic. Use tools like the Profiler to monitor performance continuously and adjust your indexing strategy as business requirements and query patterns evolve.