MongoDB Indexing Best Practices

2025-09-18 1939 words 10 minutes

Contents

Indexing in MongoDB is not a checklist item. It is an operating tradeoff.

Every useful index improves some reads and makes some writes more expensive. Every extra index consumes memory, disk, build time, and operational attention. In MongoDB 8.2.6, the mechanics are mature, but the judgment problem has not changed: good indexes come from real query patterns, not from indexing every field that looks important.

This article focuses on production indexing decisions, not synthetic benchmark theater.

Start with query patterns, not schema shape

The most common indexing mistake is designing from the document model instead of the workload.

A collection can contain fields that matter to the business but almost never matter to query planning. An index only earns its keep if it supports one or more of these:

high-frequency lookups
latency-sensitive endpoints
expensive sorts
selective filters
uniqueness guarantees
retention or lifecycle policies

Before creating an index, write down the actual query shape:

filter fields
sort fields
projection fields
result size
frequency
read/write ratio
whether the query is user-facing or background

For example:

        
        
        
    
db.orders.find(
  {
    tenantId: "t_42",
    status: "paid",
    createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
  },
  {
    _id: 0,
    orderNo: 1,
    total: 1,
    createdAt: 1
  }
).sort({ createdAt: -1 }).limit(50)

That tells you much more than “orders has tenantId, status, and createdAt.”

Read `explain()` before guessing

If you are not looking at explain(), you are mostly guessing.

For performance work, use:

        
db.orders.find({
  tenantId: "t_42",
  status: "paid",
  createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
}).sort({ createdAt: -1 }).limit(50).explain("executionStats")

What matters first:

winningPlan
totalKeysExamined
totalDocsExamined
whether there is a blocking sort
whether the scan is selective enough
whether a FETCH stage is doing most of the work

A few practical interpretations:

`COLLSCAN`

Usually bad for latency-sensitive paths unless the collection is tiny or the query is rare.

`IXSCAN` with huge `totalDocsExamined`

Not automatically good. You may have an index, but still be reading too many documents.

Blocking sort

If MongoDB must sort a large intermediate result in memory, the index is probably not aligned with the query shape.

`totalDocsExamined: 0`

This often indicates a covered query, which is excellent when it is achieved without contorting the index strategy.

The goal is not “use an index.” The goal is “use the right index efficiently.”

ESR is a rule of thumb, not a law of physics

For compound indexes, the usual guidance is ESR:

Equality
Sort
Range

That is a good starting point because equality predicates usually narrow the search space first, sort fields can avoid in-memory sort, and range fields often belong later.

Example query:

        
db.orders.find({
  tenantId: "t_42",
  status: "paid",
  createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
}).sort({ createdAt: -1 })

A reasonable candidate index is:

        
db.orders.createIndex({ tenantId: 1, status: 1, createdAt: -1 })

That lines up well because:

tenantId and status are equality filters
createdAt is both range and sort field
the sort direction matches query usage

But ESR is not something to apply mechanically.

Where ESR tradeoffs show up

Low-selectivity equality fields
- If a field has very few possible values, placing it early may not help much.
- Example: status with values like active, inactive, deleted
Multi-tenant systems
- tenantId often belongs early even when another field is more selective, because it partitions workload cleanly and protects isolation
Sort-heavy workloads
- Sometimes preserving sort support is more important than squeezing one more equality field into the prefix
Range predicates
- Once range logic becomes central, fields after it may be less useful for index traversal and sort support

The point of ESR is to give you candidate orderings to test with explain(), not to replace testing.

Selectivity matters more than people think

Indexes help when they avoid work. If a predicate matches most of the collection, the index may add little value.

Example:

        
db.users.find({ isActive: true })

If 97% of users are active, an index on isActive alone is rarely compelling. The index is technically usable, but not very selective.

A more useful pattern might be:

        
db.users.createIndex({ tenantId: 1, isActive: 1, lastSeenAt: -1 })

if the real query is tenant-scoped and ordered by recent activity.

Questions worth asking:

Does this filter eliminate enough documents to matter?
Is this field almost always present with the same value?
Is the query really driven by another field?
Is this index only helping a query nobody cares about?

Low-selectivity indexes are a common source of write cost with little read payoff.

Compound indexes beat piles of single-field indexes

A cluster of single-field indexes does not magically equal one good compound index.

If the application repeatedly runs:

        
db.events.find({
  tenantId: "t_42",
  eventType: "login",
  createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
}).sort({ createdAt: -1 }).limit(100)

then this:

        
db.events.createIndex({ tenantId: 1, eventType: 1, createdAt: -1 })

is usually more useful than separate indexes on:

tenantId
eventType
createdAt

Why:

it matches the actual query shape
it can support sort directly
it avoids extra planner complexity
it reduces document fetch churn relative to weaker plans

Single-field indexes still have their place, but production workloads are usually dominated by repeated multi-field access patterns.

Covered queries are great, but don’t build cargo-cult covering indexes

A covered query avoids fetching the underlying documents because the index contains everything the query needs.

Example:

        
db.orders.createIndex({ tenantId: 1, status: 1, createdAt: -1, orderNo: 1, total: 1 })

paired with:

        
db.orders.find(
  { tenantId: "t_42", status: "paid" },
  { _id: 0, orderNo: 1, total: 1, createdAt: 1 }
).sort({ createdAt: -1 }).limit(20)

If explain() shows totalDocsExamined: 0, that is excellent.

But covered queries are not free. Adding extra fields to “cover” every response can make indexes:

larger
slower to update
less cache-friendly
redundant with nearby indexes

Use covering strategically for hot read paths, not as a blanket policy.

Sort and index interaction is where many indexes fail

A query can have a decent filter index and still perform badly because the sort is not supported.

Example:

        
db.tickets.find({
  tenantId: "t_42",
  priority: "high"
}).sort({ updatedAt: -1 })

If the index is only:

        
db.tickets.createIndex({ tenantId: 1, priority: 1 })

MongoDB may still need an in-memory sort.

A better index for that query shape is:

        
db.tickets.createIndex({ tenantId: 1, priority: 1, updatedAt: -1 })

The practical lesson:

if the endpoint depends on sorted output
and the result set before limiting can be large
the sort field usually needs to be part of the compound index

Otherwise you often end up “indexed, but still slow.”

Partial indexes are one of the highest-leverage tools

Partial indexes are underrated because they force you to be explicit about workload.

If only live records matter to the application path, index only live records.

Example:

        
db.users.createIndex(
  { tenantId: 1, email: 1 },
  { partialFilterExpression: { deletedAt: { $exists: false } } }
)

This is often better than indexing all documents when soft-deleted rows accumulate forever.

Good partial index use cases:

soft deletes
active records only
only documents with a required field
only pending jobs
only publicly visible content
only unresolved incidents

Benefits:

smaller index
lower write amplification
better cache behavior
clearer alignment with actual reads

The catch: the query must include conditions compatible with the partial filter, or the planner may not use the index.

Unique indexes should encode invariants, not hope

If a value must be unique, enforce it with a unique index.

Application-level pre-checks are not enough under concurrency.

Examples:

        
db.users.createIndex({ tenantId: 1, email: 1 }, { unique: true })

        
db.apiKeys.createIndex({ keyHash: 1 }, { unique: true })

In multi-tenant systems, this is where people often get the scope wrong.
If uniqueness is per tenant, the unique index must be per tenant.
If uniqueness is global, say so in the index.

Do not use a unique index casually on dirty legacy data without checking collisions first. Index builds are not the time to discover your production invariant was fictional.

TTL indexes solve retention, not archival strategy

TTL indexes are useful for expiring data automatically:

        
db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })

Good fits:

sessions
ephemeral tokens
temporary caches
short-lived events
one-time verification artifacts

What they are not:

a substitute for archival policy
a precise deletion scheduler
a cleanup strategy for business-critical data

TTL expiry is background cleanup, not a contractual deadline. Use it where eventual expiration is acceptable.

Sparse versus partial: prefer partial unless you truly want sparse behavior

Sparse indexes are older and simpler: they omit documents that do not contain the indexed field.

That sounds useful, but in most production cases partial indexes are clearer because they express intent.

Sparse example:

        
db.profiles.createIndex({ phoneNumber: 1 }, { sparse: true })

Partial alternative:

        
db.profiles.createIndex(
  { phoneNumber: 1 },
  { partialFilterExpression: { phoneNumber: { $exists: true } } }
)

Why partial is often better:

the condition is explicit
you can express more than field existence
it is easier to align with real workload rules

Sparse indexes still exist for a reason, but partial indexes are usually the more precise operational tool.

Text and wildcard indexes have legitimate uses, but both are easy to misuse

Text indexes

Text indexes are useful when you actually need MongoDB text search semantics. They are not a general substitute for deliberate search design.

Caveats:

relevance behavior may not match product expectations
language handling matters
combining text search with other filters and sorts has practical limits
search-heavy products often outgrow plain text indexes

Use text indexes for modest built-in search needs, not because “we should index all text fields.”

Wildcard indexes

Wildcard indexes can help in flexible-schema scenarios, especially where field paths vary.

But they are not a free pass for undisciplined indexing. They can become expensive and broad quickly.

Good use cases are narrower than people think:

operational tooling over schemaless metadata
selective support for dynamic keys
transitional environments where access patterns are still settling

If your application has stable hot paths, normal targeted indexes are usually better.

Every index has write amplification cost

This is where teams get surprised in production.

Each additional index means more work on:

insert
update
delete
document moves in some cases
index build and replication overhead

If a collection is write-heavy, every “maybe useful” index should be treated skeptically.

A practical question to ask before adding an index:

Which slow or important query earns the right to make every future write more expensive?

If you cannot answer that clearly, the index is probably premature.

Review index usage regularly

Indexes should be reviewed like code and infra: periodically, with evidence.

Useful tools include:

        
db.collection.getIndexes()

and:

        
db.collection.aggregate([{ $indexStats: {} }])

Things worth checking:

indexes with very low or zero usage
overlapping compound indexes
old indexes left behind after feature changes
indexes that were added for incidents but never removed
indexes supporting queries that no longer matter

It is common to find a collection carrying years of accumulated indexing decisions from different teams and different workloads. That usually means unnecessary write cost.

A practical index review workflow

In production, a sane workflow looks like this:

identify slow or important queries
capture real query shapes
inspect explain("executionStats")
design one or two candidate indexes
validate against actual sort, projection, and selectivity needs
measure read improvement
account for write cost and storage impact
remove or consolidate dead indexes later

The “later” part matters. Index cleanup is real performance work.

A short checklist

Before adding an index, ask:

What query is this for?
How often does that query run?
Is it latency-sensitive?
Is the filter selective enough?
Does the index support sort, not just filter?
Am I trying to help one hot path or every hypothetical path?
What write cost am I adding?
Does an existing compound index already cover this need?
Could a partial index do the job more cheaply?
Have I checked the plan with explain()?

Good MongoDB indexing in 8.2.6 is not about collecting index types.
It is about making specific queries cheaper without turning every write into tax.

Contents

MongoDB Indexing Best Practices

Start with query patterns, not schema shape

Read explain() before guessing

COLLSCAN

IXSCAN with huge totalDocsExamined

Blocking sort

totalDocsExamined: 0

ESR is a rule of thumb, not a law of physics

Where ESR tradeoffs show up

Selectivity matters more than people think

Compound indexes beat piles of single-field indexes

Covered queries are great, but don’t build cargo-cult covering indexes

Sort and index interaction is where many indexes fail

Partial indexes are one of the highest-leverage tools

Unique indexes should encode invariants, not hope

TTL indexes solve retention, not archival strategy

Sparse versus partial: prefer partial unless you truly want sparse behavior

Text and wildcard indexes have legitimate uses, but both are easy to misuse

Text indexes

Wildcard indexes

Every index has write amplification cost

Review index usage regularly

A practical index review workflow

A short checklist

Read `explain()` before guessing

`COLLSCAN`

`IXSCAN` with huge `totalDocsExamined`

`totalDocsExamined: 0`