MongoDB Indexing Best Practices
Indexing in MongoDB is not a checklist item. It is an operating tradeoff.
Every useful index improves some reads and makes some writes more expensive. Every extra index consumes memory, disk, build time, and operational attention. In MongoDB 8.2.6, the mechanics are mature, but the judgment problem has not changed: good indexes come from real query patterns, not from indexing every field that looks important.
This article focuses on production indexing decisions, not synthetic benchmark theater.
Start with query patterns, not schema shape
The most common indexing mistake is designing from the document model instead of the workload.
A collection can contain fields that matter to the business but almost never matter to query planning. An index only earns its keep if it supports one or more of these:
- high-frequency lookups
- latency-sensitive endpoints
- expensive sorts
- selective filters
- uniqueness guarantees
- retention or lifecycle policies
Before creating an index, write down the actual query shape:
- filter fields
- sort fields
- projection fields
- result size
- frequency
- read/write ratio
- whether the query is user-facing or background
For example:
db.orders.find(
{
tenantId: "t_42",
status: "paid",
createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
},
{
_id: 0,
orderNo: 1,
total: 1,
createdAt: 1
}
).sort({ createdAt: -1 }).limit(50)That tells you much more than “orders has tenantId, status, and createdAt.”
Read explain() before guessing
If you are not looking at explain(), you are mostly guessing.
For performance work, use:
db.orders.find({
tenantId: "t_42",
status: "paid",
createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
}).sort({ createdAt: -1 }).limit(50).explain("executionStats")What matters first:
winningPlantotalKeysExaminedtotalDocsExamined- whether there is a blocking sort
- whether the scan is selective enough
- whether a
FETCHstage is doing most of the work
A few practical interpretations:
COLLSCAN
Usually bad for latency-sensitive paths unless the collection is tiny or the query is rare.
IXSCAN with huge totalDocsExamined
Not automatically good. You may have an index, but still be reading too many documents.
Blocking sort
If MongoDB must sort a large intermediate result in memory, the index is probably not aligned with the query shape.
totalDocsExamined: 0
This often indicates a covered query, which is excellent when it is achieved without contorting the index strategy.
The goal is not “use an index.” The goal is “use the right index efficiently.”
ESR is a rule of thumb, not a law of physics
For compound indexes, the usual guidance is ESR:
- Equality
- Sort
- Range
That is a good starting point because equality predicates usually narrow the search space first, sort fields can avoid in-memory sort, and range fields often belong later.
Example query:
db.orders.find({
tenantId: "t_42",
status: "paid",
createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
}).sort({ createdAt: -1 })A reasonable candidate index is:
db.orders.createIndex({ tenantId: 1, status: 1, createdAt: -1 })That lines up well because:
tenantIdandstatusare equality filterscreatedAtis both range and sort field- the sort direction matches query usage
But ESR is not something to apply mechanically.
Where ESR tradeoffs show up
-
Low-selectivity equality fields
- If a field has very few possible values, placing it early may not help much.
- Example:
statuswith values likeactive,inactive,deleted
-
Multi-tenant systems
tenantIdoften belongs early even when another field is more selective, because it partitions workload cleanly and protects isolation
-
Sort-heavy workloads
- Sometimes preserving sort support is more important than squeezing one more equality field into the prefix
-
Range predicates
- Once range logic becomes central, fields after it may be less useful for index traversal and sort support
The point of ESR is to give you candidate orderings to test with explain(), not to replace testing.
Selectivity matters more than people think
Indexes help when they avoid work. If a predicate matches most of the collection, the index may add little value.
Example:
db.users.find({ isActive: true })If 97% of users are active, an index on isActive alone is rarely compelling. The index is technically usable, but not very selective.
A more useful pattern might be:
db.users.createIndex({ tenantId: 1, isActive: 1, lastSeenAt: -1 })if the real query is tenant-scoped and ordered by recent activity.
Questions worth asking:
- Does this filter eliminate enough documents to matter?
- Is this field almost always present with the same value?
- Is the query really driven by another field?
- Is this index only helping a query nobody cares about?
Low-selectivity indexes are a common source of write cost with little read payoff.
Compound indexes beat piles of single-field indexes
A cluster of single-field indexes does not magically equal one good compound index.
If the application repeatedly runs:
db.events.find({
tenantId: "t_42",
eventType: "login",
createdAt: { $gte: ISODate("2025-09-01T00:00:00Z") }
}).sort({ createdAt: -1 }).limit(100)then this:
db.events.createIndex({ tenantId: 1, eventType: 1, createdAt: -1 })is usually more useful than separate indexes on:
tenantIdeventTypecreatedAt
Why:
- it matches the actual query shape
- it can support sort directly
- it avoids extra planner complexity
- it reduces document fetch churn relative to weaker plans
Single-field indexes still have their place, but production workloads are usually dominated by repeated multi-field access patterns.
Covered queries are great, but don’t build cargo-cult covering indexes
A covered query avoids fetching the underlying documents because the index contains everything the query needs.
Example:
db.orders.createIndex({ tenantId: 1, status: 1, createdAt: -1, orderNo: 1, total: 1 })paired with:
db.orders.find(
{ tenantId: "t_42", status: "paid" },
{ _id: 0, orderNo: 1, total: 1, createdAt: 1 }
).sort({ createdAt: -1 }).limit(20)If explain() shows totalDocsExamined: 0, that is excellent.
But covered queries are not free. Adding extra fields to “cover” every response can make indexes:
- larger
- slower to update
- less cache-friendly
- redundant with nearby indexes
Use covering strategically for hot read paths, not as a blanket policy.
Sort and index interaction is where many indexes fail
A query can have a decent filter index and still perform badly because the sort is not supported.
Example:
db.tickets.find({
tenantId: "t_42",
priority: "high"
}).sort({ updatedAt: -1 })If the index is only:
db.tickets.createIndex({ tenantId: 1, priority: 1 })MongoDB may still need an in-memory sort.
A better index for that query shape is:
db.tickets.createIndex({ tenantId: 1, priority: 1, updatedAt: -1 })The practical lesson:
- if the endpoint depends on sorted output
- and the result set before limiting can be large
- the sort field usually needs to be part of the compound index
Otherwise you often end up “indexed, but still slow.”
Partial indexes are one of the highest-leverage tools
Partial indexes are underrated because they force you to be explicit about workload.
If only live records matter to the application path, index only live records.
Example:
db.users.createIndex(
{ tenantId: 1, email: 1 },
{ partialFilterExpression: { deletedAt: { $exists: false } } }
)This is often better than indexing all documents when soft-deleted rows accumulate forever.
Good partial index use cases:
- soft deletes
- active records only
- only documents with a required field
- only pending jobs
- only publicly visible content
- only unresolved incidents
Benefits:
- smaller index
- lower write amplification
- better cache behavior
- clearer alignment with actual reads
The catch: the query must include conditions compatible with the partial filter, or the planner may not use the index.
Unique indexes should encode invariants, not hope
If a value must be unique, enforce it with a unique index.
Application-level pre-checks are not enough under concurrency.
Examples:
db.users.createIndex({ tenantId: 1, email: 1 }, { unique: true })db.apiKeys.createIndex({ keyHash: 1 }, { unique: true })In multi-tenant systems, this is where people often get the scope wrong.
If uniqueness is per tenant, the unique index must be per tenant.
If uniqueness is global, say so in the index.
Do not use a unique index casually on dirty legacy data without checking collisions first. Index builds are not the time to discover your production invariant was fictional.
TTL indexes solve retention, not archival strategy
TTL indexes are useful for expiring data automatically:
db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 })Good fits:
- sessions
- ephemeral tokens
- temporary caches
- short-lived events
- one-time verification artifacts
What they are not:
- a substitute for archival policy
- a precise deletion scheduler
- a cleanup strategy for business-critical data
TTL expiry is background cleanup, not a contractual deadline. Use it where eventual expiration is acceptable.
Sparse versus partial: prefer partial unless you truly want sparse behavior
Sparse indexes are older and simpler: they omit documents that do not contain the indexed field.
That sounds useful, but in most production cases partial indexes are clearer because they express intent.
Sparse example:
db.profiles.createIndex({ phoneNumber: 1 }, { sparse: true })Partial alternative:
db.profiles.createIndex(
{ phoneNumber: 1 },
{ partialFilterExpression: { phoneNumber: { $exists: true } } }
)Why partial is often better:
- the condition is explicit
- you can express more than field existence
- it is easier to align with real workload rules
Sparse indexes still exist for a reason, but partial indexes are usually the more precise operational tool.
Text and wildcard indexes have legitimate uses, but both are easy to misuse
Text indexes
Text indexes are useful when you actually need MongoDB text search semantics. They are not a general substitute for deliberate search design.
Caveats:
- relevance behavior may not match product expectations
- language handling matters
- combining text search with other filters and sorts has practical limits
- search-heavy products often outgrow plain text indexes
Use text indexes for modest built-in search needs, not because “we should index all text fields.”
Wildcard indexes
Wildcard indexes can help in flexible-schema scenarios, especially where field paths vary.
But they are not a free pass for undisciplined indexing. They can become expensive and broad quickly.
Good use cases are narrower than people think:
- operational tooling over schemaless metadata
- selective support for dynamic keys
- transitional environments where access patterns are still settling
If your application has stable hot paths, normal targeted indexes are usually better.
Every index has write amplification cost
This is where teams get surprised in production.
Each additional index means more work on:
- insert
- update
- delete
- document moves in some cases
- index build and replication overhead
If a collection is write-heavy, every “maybe useful” index should be treated skeptically.
A practical question to ask before adding an index:
Which slow or important query earns the right to make every future write more expensive?
If you cannot answer that clearly, the index is probably premature.
Review index usage regularly
Indexes should be reviewed like code and infra: periodically, with evidence.
Useful tools include:
db.collection.getIndexes()and:
db.collection.aggregate([{ $indexStats: {} }])Things worth checking:
- indexes with very low or zero usage
- overlapping compound indexes
- old indexes left behind after feature changes
- indexes that were added for incidents but never removed
- indexes supporting queries that no longer matter
It is common to find a collection carrying years of accumulated indexing decisions from different teams and different workloads. That usually means unnecessary write cost.
A practical index review workflow
In production, a sane workflow looks like this:
- identify slow or important queries
- capture real query shapes
- inspect
explain("executionStats") - design one or two candidate indexes
- validate against actual sort, projection, and selectivity needs
- measure read improvement
- account for write cost and storage impact
- remove or consolidate dead indexes later
The “later” part matters. Index cleanup is real performance work.
A short checklist
Before adding an index, ask:
- What query is this for?
- How often does that query run?
- Is it latency-sensitive?
- Is the filter selective enough?
- Does the index support sort, not just filter?
- Am I trying to help one hot path or every hypothetical path?
- What write cost am I adding?
- Does an existing compound index already cover this need?
- Could a partial index do the job more cheaply?
- Have I checked the plan with
explain()?
Good MongoDB indexing in 8.2.6 is not about collecting index types.
It is about making specific queries cheaper without turning every write into tax.