All courses > Technology and Programming > Databases ::

Indexing 101: Recognizing Over-Indexing and Redundant Index Patterns

Capítulo 7

Estimated reading time: 8 minutes

What “Over-Indexing” Looks Like in Practice

Over-indexing happens when a table accumulates more indexes than it needs to support real query patterns. The problem is not “many indexes” by itself; it is “indexes that don’t add unique value.” Redundant or low-value indexes create extra work every time rows are inserted, updated, or deleted, and they increase the amount of data the database must keep consistent during maintenance.

Common symptoms

Increased write cost: inserts/updates/deletes take longer because more index structures must be updated.
Bloated storage: index pages consume disk and memory (buffer cache), pushing out more useful data.
Slower maintenance operations: vacuum/reindex/analyze, online rebuilds, replication, backups, and restores can take longer because there is more index data to process.
Unstable performance: the optimizer has more choices, and small data distribution changes can cause plan flips between similar indexes, leading to inconsistent latency.
Continue in our app.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Or continue reading below...
Download the app

A Structured Approach: Find Redundancy Patterns First

Start by grouping indexes by table, then by their key columns (and included columns, if your database supports them). The goal is to spot patterns where one index fully covers another’s access paths. The cases below are the most common sources of unnecessary indexes.

Case (1): Duplicate Indexes

A duplicate index is two indexes that are effectively the same: same table, same key columns in the same order, same uniqueness, and (if applicable) the same predicate (partial index) and included columns. Duplicates often appear after migrations, ORM changes, or “just in case” additions.

How to recognize duplicates

Same key definition: (a, b) and (a, b).
Same type and options: both B-tree, both unique/non-unique, same collation/opclass (engine-specific).
Same scope: both full-table or both partial with the same predicate.

Practical example

-- Two indexes that do the same job (duplicate keys/order/options in spirit)  CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at); CREATE INDEX idx_orders_customer_created_2 ON orders(customer_id, created_at);

Only one is needed. Keeping both doubles the write and maintenance overhead for no additional read benefit.

Step-by-step: what to do

Pick the index to keep (often the older one referenced by tooling/constraints, or the one with the preferred name).
Verify no special properties differ (unique vs non-unique, partial predicate, included columns).
Confirm usage metrics (if available) show both serve the same workload; duplicates typically show similar or one shows near-zero usage.

Case (2): Indexes That Are Strict Prefixes of Others

A strict-prefix redundancy occurs when one index’s key columns are the leftmost portion of another index’s key columns. In many engines and query patterns, the longer index can serve queries that filter on the prefix alone, making the shorter index unnecessary.

Recognizing strict-prefix patterns

(a) is a prefix of (a, b).
(a, b) is a prefix of (a, b, c).

Practical example

CREATE INDEX idx_orders_customer ON orders(customer_id); CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at);

If your workload commonly runs queries like WHERE customer_id = ?, the composite index (customer_id, created_at) can often support that access path. The single-column index may be redundant.

When the prefix index might still be justified

Size and cache behavior: the shorter index is smaller and may stay hot in memory, benefiting very high-frequency point lookups.
Different sort needs: if queries rely on ordering that the longer index does not provide efficiently (engine-specific details matter).
Partial vs full: a partial index on (a) is not replaced by a full index on (a, b).

Step-by-step: decide whether to drop the prefix index

List the queries that use the prefix columns only (e.g., WHERE customer_id = ?).
Test those queries with the longer index available and the shorter index disabled/removed in a staging environment.
Compare latency and plan stability under representative load (not just a single run).

Case (3): Overlapping Composite Indexes with Suboptimal Order

This case appears when you have multiple composite indexes that share columns but in different orders, or when one index exists primarily because the “right” order wasn’t chosen earlier. The result is a pile of similar indexes, each partially useful, none ideal.

Common overlapping patterns

(a, b) and (b, a) both exist “to be safe.”
(a, b, c) exists, plus (a, c) exists because queries on (a, c) weren’t served well.
Multiple variants: (customer_id, status), (customer_id, created_at), (customer_id, status, created_at), created over time without consolidating.

Practical example: consolidate instead of stacking

-- Existing indexes created over time CREATE INDEX idx_orders_customer_status ON orders(customer_id, status); CREATE INDEX idx_orders_customer_created ON orders(customer_id, created_at); CREATE INDEX idx_orders_status_customer ON orders(status, customer_id);

Suppose the real workload is dominated by: WHERE customer_id = ? AND status = ? ORDER BY created_at DESC LIMIT 50. A single well-chosen composite (plus includes if needed) can often replace multiple overlapping indexes. The key is to align one index with the dominant access path rather than keeping several “almost right” ones.

Symptoms that you have this problem

Many indexes share the same first column (or same set of columns) but differ slightly.
Query plans alternate between similar indexes depending on parameter values or statistics updates.
Index usage metrics show several indexes with low-to-medium usage instead of one with high usage.

Step-by-step: rationalize overlapping composites

1) Group by column set: cluster indexes that reference the same columns in any order.
2) Map to real queries: for each group, list the top queries (by frequency and latency impact) that touch those columns.
3) Pick a “primary” index per group: choose the order that best matches the most important query pattern(s).
4) Validate that the primary index covers the others’ access paths: ensure it can serve the key filters and ordering needs that justified the other indexes.
5) Remove the leftovers gradually: drop one candidate at a time and re-measure.

Case (4): Rarely Used Indexes

Some indexes are not redundant on paper, but they are rarely used in practice. They may have been created for a report that no longer runs, a feature flag that was removed, or an anticipated query pattern that never materialized.

How to recognize rarely used indexes

Low or zero usage counts in index statistics views (engine-specific).
High write overhead relative to benefit: tables with heavy write volume are especially sensitive to unused indexes.
Only used by ad-hoc queries: occasional manual queries may not justify permanent cost.

Practical checklist before removing a “unused” index

Confirm the observation window is long enough (include month-end jobs, weekly reports, and seasonal traffic).
Check whether the index supports constraints (e.g., unique constraints) or critical integrity checks.
Identify whether the index is used indirectly (e.g., by foreign key enforcement patterns in some systems) or by a critical but infrequent job.

Evaluating Removal Candidates: A Simple Scoring Method

To decide what to remove safely, evaluate each index with three questions: how often it’s used, how critical the queries are, and whether another index already provides the same access paths. You can turn this into a lightweight score to prioritize candidates.

1) Query frequency (how often is it used?)

High: used continuously in OLTP traffic.
Medium: used daily/weekly.
Low: used rarely or never in the observed window.

2) Criticality (what breaks if it’s slower?)

Critical: user-facing endpoints, checkout/login, core workflows.
Important: internal tools, operational dashboards.
Non-critical: offline reports, one-off analytics, backfills that can run longer.

3) Coverage (is it already served by another index?)

Ask whether another index can satisfy the same filtering and ordering needs with comparable efficiency. Common “coverage” relationships include:

Duplicate: exact same keys/options.
Strict prefix: a longer index can serve the shorter index’s access path.
Consolidation candidate: one well-chosen composite can replace several overlapping ones.

Candidate classification

Easy drop: duplicate indexes; unused indexes that are clearly not tied to constraints and have a long observation window.
Needs testing: strict-prefix cases and overlapping composites where plan quality may change.
Keep (for now): indexes supporting critical queries with no clear replacement.

Safe Removal Workflow (Minimize Risk)

1) Inventory

Export a list of all indexes per table, including: key columns (and order), included columns, uniqueness, partial predicates, and size.
Collect usage metrics and last-used timestamps if your database provides them.
Tag indexes that back constraints (primary key, unique constraints) so they are not removed accidentally.

2) Validate coverage with representative queries

For each removal candidate, identify the top queries it is supposed to help (from query logs/APM/slow query samples).
Run those queries in a staging environment with production-like data volume and distribution.
Compare execution plans and latency with and without the index (or with the index made invisible/disabled if supported).
Include write-heavy tests if the table is frequently modified, since removing indexes should improve write throughput.

3) Remove gradually

Drop one index at a time (or one small group of obvious duplicates), not a large batch.
Schedule changes during a low-risk window and ensure you can recreate the index quickly if needed.
Prefer online/concurrent drop options if your engine supports them to reduce locking impact.

4) Re-measure

Track read latency for the affected endpoints and the specific queries you validated.
Track write latency/throughput and maintenance job durations (vacuum/reindex/analyze, replication lag, backup time).
Watch for plan instability: increased variance can indicate the removed index was acting as a “stabilizer” for certain parameter ranges.

Now answer the exercise about the content:

Which situation best describes a duplicate index that is an “easy drop” candidate in an over-indexed table?

You are right! Congratulations, now go to the next page

You missed! Try again.

A duplicate index is effectively the same definition twice (same keys/order and matching options like uniqueness and predicates). Keeping both adds write and maintenance overhead without improving reads, so one is typically safe to drop.