Indexing 101: Covering Indexes, Included Columns, and Fewer Lookups

Capítulo 5

Estimated reading time: 7 minutes

+ Exercise

What it means to “cover” a query

An index covers a query when the database can produce the query’s result using only the index structure, without fetching additional columns from the base table (heap or clustered table). Practically, that means every column the query needs is available from the index: columns used to filter (WHERE), join (ON), group (GROUP BY), sort (ORDER BY), and return (the SELECT list).

Covering is about the query’s projection (the SELECT list) as much as its filtering. Two queries can have identical WHERE clauses, but the one that selects fewer columns may be coverable while the other forces extra lookups.

Key columns vs non-key (included) columns

Many databases distinguish between columns that define the index order (key columns) and columns stored in the index only to satisfy queries (often called included columns or non-key columns). Key columns affect ordering and seekability; included columns do not change the index’s sort order but can make the index cover more queries.

Conceptually, a covering index is “wide enough” to answer the query. You can achieve that by: (1) choosing key columns that match common filters and sort/group needs, and (2) adding a small set of included columns that are frequently returned.

Example: same filter, different projection

Assume a table:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Orders(OrderId, CustomerId, OrderDate, Status, TotalAmount, ShippingCity, Notes)

And an index optimized for looking up a customer’s recent orders:

-- Example syntax (varies by database engine), concept is the same: key + included columns

Query A selects a few columns:

SELECT OrderId, OrderDate, TotalAmountFROM OrdersWHERE CustomerId = 42  AND Status = 'PAID'ORDER BY OrderDate DESC;

If an index contains (as key columns) CustomerId, Status, OrderDate and also stores OrderId and TotalAmount, the engine can often do a seek into the index and return rows directly from the index pages. That is a “seek only” plan (index-only scan/seek).

Query B selects many columns:

SELECT OrderId, OrderDate, TotalAmount, ShippingCity, NotesFROM OrdersWHERE CustomerId = 42  AND Status = 'PAID'ORDER BY OrderDate DESC;

If ShippingCity and Notes are not in the index, the engine can still use the index to find matching rows, but it must fetch the missing columns from the base table for each qualifying row. That typically becomes “seek + lookup.”

Step-by-step: determine whether a query can be covered

  • List required columns. Collect columns from SELECT, WHERE, JOIN, GROUP BY, ORDER BY, and any expressions that reference columns.
  • Check if an existing index already contains them. Remember: key columns plus included columns count as “present.”
  • If not covered, decide what to add. Prefer adding missing columns as included columns if they don’t need to participate in ordering or range filtering.
  • Validate the plan. After adding/changing the index, confirm the execution changes from “seek + lookup” to “seek only” (or an index-only plan) for the target query.

The performance impact of avoiding extra row lookups

A “seek + lookup” plan is not automatically bad. It can be excellent when the seek returns only a handful of rows. The problem appears when the seek returns many rows and each one triggers an additional fetch from the base table. Those extra fetches add random I/O, extra buffer reads, and CPU overhead, and they can amplify contention under concurrency.

What a lookup costs

When the index does not contain all needed columns, the engine typically does this:

  • Seek the index to find qualifying row identifiers (primary key values, row IDs, or clustered keys).
  • Lookup each row in the base table to retrieve missing columns.

If 5 rows qualify, that’s 5 lookups—often fine. If 50,000 rows qualify, that’s 50,000 lookups—often painful.

Example: turning “seek + lookup” into “seek only” with included columns

Suppose you frequently run:

SELECT OrderDate, TotalAmountFROM OrdersWHERE CustomerId = 42  AND Status = 'PAID';

And you have an index that supports filtering but not projection:

-- Key columns support the WHERE clause, but TotalAmount is missing

The engine can seek on (CustomerId, Status) but must look up TotalAmount from the base table for each matching order.

If you add TotalAmount as a non-key/included column, the index can now return both OrderDate and TotalAmount directly. The plan can change from “seek + lookup” to “seek only,” reducing base-table reads dramatically for high-match queries.

Selecting a few columns vs selecting many columns

Covering is easiest when you select a narrow set of columns. Two practical patterns emerge:

  • Latency-sensitive endpoints should select only what they need. If an API endpoint only displays OrderDate and TotalAmount, don’t select ShippingCity and Notes “just in case.” Narrow projections are more likely to be covered and cheaper to move through memory and network.
  • Wide SELECT lists tend to force lookups. The more columns you request, the less likely they all exist in a single index, and the more tempting it becomes to create very wide indexes (which has its own costs).

Covering, sorting, and aggregation

Covering is not only about returning columns; it can also reduce work for ORDER BY and GROUP BY.

ORDER BY: If the index key order matches the ORDER BY (and the query’s filters allow it), the engine can often read rows in the needed order directly from the index, avoiding an explicit sort. If the query is also covered, it can do so without touching the base table.

SELECT OrderId, OrderDate, TotalAmountFROM OrdersWHERE CustomerId = 42ORDER BY OrderDate DESC;

An index keyed on (CustomerId, OrderDate DESC) with included (TotalAmount) can both (1) seek to CustomerId and (2) emit rows already ordered by OrderDate, and (3) return TotalAmount without lookups.

GROUP BY / aggregates: Aggregations can benefit when the index provides the grouping columns and the aggregated columns without lookups. Even when the engine must scan a range, scanning an index can be cheaper than scanning the base table if the index is narrower and contains exactly what’s needed.

SELECT Status, COUNT(*) AS CntFROM OrdersWHERE CustomerId = 42GROUP BY Status;

If the index contains CustomerId and Status, the engine can often compute counts from the index entries alone. If the query also needs SUM(TotalAmount), then TotalAmount must be available in the index (as key or included) to avoid lookups while summing.

Tradeoffs: index size, write overhead, and keeping coverage targeted

Covering indexes improve read performance by reducing lookups, but they are not free. Adding columns to an index increases its footprint and the cost of maintaining it during writes.

Index size and cache pressure

  • Wider indexes consume more storage. Included columns are stored in the index leaf level (and sometimes beyond), increasing page count.
  • More pages means more I/O. Even if you avoid lookups, scanning or traversing a larger index can require more reads.
  • Cache efficiency can drop. A narrow index can keep “hot” working sets in memory; a wide index may push useful pages out of cache.

Write amplification and maintenance

  • INSERT/UPDATE/DELETE become more expensive. Every additional index (and every additional indexed column) increases work on writes.
  • Updates to included columns still matter. If an included column changes, the index leaf entry must be updated too, which can add contention and page splits depending on the engine and workload.
  • More indexes complicate tuning. The optimizer has more choices, and you have more objects to monitor and maintain.

Practical decision rubric for covering

  • Cover only frequently executed, latency-sensitive queries. Prioritize queries on critical paths (user-facing pages, APIs, batch jobs with tight SLAs) where lookups are a proven bottleneck.
  • Keep included columns minimal. Include only columns needed by the query’s SELECT list (and possibly to avoid lookups for aggregates). Avoid “include everything” indexes.
  • Prefer stable, low-churn columns. Including columns that change often can increase write cost.
  • Reassess as query shapes evolve. When SELECT lists, filters, or ORDER BY/GROUP BY clauses change, an index that used to cover may stop covering (or may become unnecessarily wide). Periodically review top queries and their actual projections.

Now answer the exercise about the content:

Which change is most likely to turn a “seek + lookup” plan into a “seek only” plan for a frequently run query?

You are right! Congratulations, now go to the next page

You missed! Try again.

A query is covered when all needed columns are available in the index. Adding missing projected columns as included columns can eliminate base-table lookups, changing a plan from seek + lookup to seek only.

Next chapter

Indexing 101: How Indexes Affect Writes, Storage, and Concurrency

Arrow Right Icon
Free Ebook cover Indexing 101: How Databases Find Data Fast
56%

Indexing 101: How Databases Find Data Fast

New course

9 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.