Indexing 101: Composite Indexes and Column Order for Real Queries

Capítulo 4

Estimated reading time: 10 minutes

+ Exercise

Composite (Multi-Column) Indexes: One Structure, Many Query Shapes

A composite index stores multiple columns together in a single ordered structure. Conceptually, the index is sorted by the first column, and within each value of the first column it is sorted by the second, and so on. This means the same set of columns can behave very differently depending on their order in the index definition.

We will use this notation: an index on (a, b, c) means the index entries are ordered by a first, then b, then c.

(1) The Leftmost-Prefix Principle

The leftmost-prefix principle says: a composite index can be used efficiently only when your query predicates match a leftmost prefix of the index columns. “Leftmost prefix” means you start from the first column of the index and move right without skipping.

What “usable” means in practice

  • If you filter on the first column (a), the index can narrow to the matching region.
  • If you filter on the first two columns (a and b), the index can narrow further.
  • If you filter on (b) without (a), the index is usually not helpful for a targeted seek, because the index is not primarily organized by b.

Example: index (a, b)

Assume an index:

CREATE INDEX idx_ab ON t(a, b);

These predicates can typically use idx_ab for an index seek (or at least a tight range scan):

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • WHERE a = ?
  • WHERE a = ? AND b = ?
  • WHERE a = ? AND b > ?

These predicates typically cannot use idx_ab for a targeted seek (they may still scan the index, which is often not what you want):

  • WHERE b = ?
  • WHERE b = ? AND a = ?

    Even though this is logically the same as WHERE a=? AND b=?, the optimizer can reorder predicates, so it can still use the index. The key point is not the textual order in SQL, but whether the query provides conditions for the leftmost index columns. If both a and b are constrained, idx_ab is usable; if only b is constrained, it usually is not.

Skipping a column breaks the prefix

With an index (a, b, c):

CREATE INDEX idx_abc ON t(a, b, c);

  • WHERE a = ? AND c = ?

    This matches a, but skips b. Many engines can use the index to find all rows with a=?, but then must check c=? row-by-row (less efficient than having b constrained too). It is not the same as having a full (a, b, c) prefix.

  • WHERE b = ? AND c = ?

    No leftmost column constraint (a is missing), so the index is usually not selective for this query.

(2) Equality-First, Then Range: A Practical Pattern

Composite indexes are most powerful when you place equality predicates first, followed by at most one “range-like” predicate (e.g., >, <, BETWEEN, LIKE 'prefix%'), and then optionally columns used for ordering or covering.

Why this pattern works: equality conditions pin the search to a narrow slice of the index. A range condition then scans a contiguous portion of that slice. After a range condition, additional columns to the right usually cannot be used to further narrow the scan (because the scan is already moving through multiple values).

Step-by-step: how the engine walks an index (a, b, created_at)

Index:

CREATE INDEX idx_ab_created ON events(a, b, created_at);

Query:

SELECT * FROM events WHERE a = ? AND b = ? AND created_at >= ?;

  • Step 1: Seek to the first entry where (a=?, b=?).
  • Step 2: From there, scan forward while created_at is within the range.
  • Step 3: Stop when (a, b) changes or created_at exceeds the range boundary (depending on the predicate).

Now compare with a different query shape:

SELECT * FROM events WHERE a = ? AND created_at >= ? AND b = ?;

Logically equivalent (AND is commutative), but the important part is: you have equality on a, range on created_at, and equality on b. With index (a, b, created_at), b is before the range column, which is good. With index (a, created_at, b), b is after the range column, which is often less useful for narrowing.

Same columns, different order: (a, b, created_at) vs (a, created_at, b)

Consider these common patterns:

  • WHERE a = ? AND b = ? AND created_at BETWEEN ? AND ?
  • WHERE a = ? AND created_at BETWEEN ? AND ?
  • WHERE a = ? AND b = ?

Index (a, b, created_at) supports all three well: it can seek by a, then b, then range-scan by created_at. Index (a, created_at, b) supports the second pattern well, but for the first pattern it will typically seek by a, range-scan by created_at, and then filter b afterwards (more scanning).

(3) Combining Filters and Sorting (ORDER BY) with Composite Indexes

A composite index can also satisfy ORDER BY without an extra sort if the query’s filtering and ordering align with the index order. This is often a major win: avoiding a sort can reduce memory use and latency.

Basic rule of thumb

  • If the index can produce rows in the required order after applying the WHERE constraints, the engine can stream results without sorting.
  • To do that, the ORDER BY columns should appear in the index in the same order (and direction, depending on the database), and any earlier index columns should be fixed by equality predicates (so they don’t interleave multiple groups).

Example: filtering then ordering within a group

Table: orders(tenant_id, status, created_at, id, total)

Query A:

SELECT id, created_at, total FROM orders WHERE tenant_id = ? AND status = ? ORDER BY created_at DESC LIMIT 50;

Index choice:

CREATE INDEX idx_orders_tenant_status_created ON orders(tenant_id, status, created_at DESC);

  • tenant_id and status are equalities (pin to a narrow slice).
  • Within that slice, rows are already in created_at DESC order, so the engine can return the first 50 without sorting.

How the same columns can fail to help sorting

If you instead create:

CREATE INDEX idx_orders_created_tenant_status ON orders(created_at DESC, tenant_id, status);

This index is great for queries primarily ordered by created_at across all tenants, but for Query A it is usually not ideal: the index is primarily organized by created_at, so finding “tenant_id=?, status=?” requires scanning many created_at entries and filtering, and the LIMIT might not help much because matches are scattered.

Example: ORDER BY with a range predicate

Query B:

SELECT id, created_at FROM orders WHERE tenant_id = ? AND created_at >= ? ORDER BY created_at ASC;

Index:

CREATE INDEX idx_orders_tenant_created ON orders(tenant_id, created_at ASC);

  • Equality on tenant_id, range on created_at: good “equality-first then range”.
  • ORDER BY created_at ASC matches the index, so no extra sort.

But if you add AND status = ? and you frequently filter by status too, consider (tenant_id, status, created_at) so status is applied before the range scan and ordering.

Choosing Column Order: A Practical Checklist

1) Start from your most common query patterns

List the top queries by frequency and latency impact. For each, write the predicate types per column: equality (=), range (>, BETWEEN, LIKE 'x%'), and ordering (ORDER BY).

2) Put equality columns first (especially those that define the “group”)

  • Columns that are almost always present as equality filters (e.g., tenant_id, account_id) are strong candidates for the leftmost positions.
  • If you have multiple equality columns, order them to best match your workload. Often you place the more selective equality earlier, but also consider which prefix you want to support (e.g., queries that filter only by tenant_id should still benefit from an index starting with tenant_id).

3) Put the main range column next

  • Time windows (created_at), numeric ranges (price), and prefix searches (name LIKE 'Al%') typically go after the equality columns.
  • Assume that columns after the range are less useful for narrowing the scan; they may still help for ordering or covering.

4) Align ORDER BY with the index when it matters

  • If a query frequently sorts and returns a small page (LIMIT/OFFSET or keyset pagination), designing the index to satisfy ORDER BY can be as important as filtering.
  • To use the index for ordering, earlier index columns should usually be constrained by equality (so the output order is meaningful within the filtered subset).

5) Validate “same columns, different order” explicitly

When you are torn between two orders, test both against your real query shapes. For example, compare:

  • (tenant_id, status, created_at)
  • (tenant_id, created_at, status)

The first favors queries that filter by status and then use a created_at range/order. The second favors queries that filter by time within a tenant regardless of status.

Mini-Lab: Design a Composite Index (and Justify the Order)

Scenario

You have a table:

tickets(id, org_id, assignee_id, state, priority, created_at)

Here are the query patterns you must support:

  • Q1:
    SELECT id, priority, created_at FROM tickets WHERE org_id = ? AND state = ? ORDER BY created_at DESC LIMIT 50;
  • Q2:
    SELECT id FROM tickets WHERE org_id = ? AND assignee_id = ? AND state = ?;
  • Q3:
    SELECT id FROM tickets WHERE org_id = ? AND created_at >= ? AND created_at < ? ORDER BY created_at ASC;

Step 1: Mark equality, range, and ordering columns

  • Q1: equality on (org_id, state), ordering on created_at DESC
  • Q2: equality on (org_id, assignee_id, state)
  • Q3: equality on (org_id), range on created_at, ordering on created_at ASC

Step 2: Propose an index order and explain what it optimizes

A strong first choice for these queries is:

CREATE INDEX idx_tickets_org_state_created ON tickets(org_id, state, created_at DESC);

  • Supports Q1 directly: equality-first (org_id, state), then ORDER BY created_at DESC without sorting, and LIMIT 50 can stop early.
  • Supports Q3 reasonably: it can seek to org_id=?, then range-scan by created_at. The extra state column in the middle is not constrained in Q3, so Q3 may not fully benefit from created_at ordering unless state is fixed; this is the trade-off.

To cover Q2 well, you might add a second index:

CREATE INDEX idx_tickets_org_assignee_state ON tickets(org_id, assignee_id, state);

  • All equality predicates, so the engine can seek tightly.
  • Column order puts org_id first (common prefix), then assignee_id, then state. If your workload more often filters by state than assignee_id, you would flip the middle columns to (org_id, state, assignee_id).

Step 3: Try an alternative and reason about the difference

Alternative A:

CREATE INDEX idx_tickets_org_created_state ON tickets(org_id, created_at DESC, state);

  • Better for Q3: equality on org_id, then range/order on created_at.
  • Worse for Q1: state is after created_at, so filtering by state happens after entering a created_at-ordered scan; you may scan many rows to find 50 matching state values.

Step 4: Your task

Pick one of these strategies and justify it based on which query is most important:

  • Strategy 1 (optimize Q1): (org_id, state, created_at DESC) plus a second index for Q2.
  • Strategy 2 (optimize Q3): (org_id, created_at DESC) plus a second index for Q1 and/or Q2.

Write your justification in terms of: which predicates become leftmost-prefix matches, where the range predicate sits, and whether ORDER BY can be satisfied by the index.

Now answer the exercise about the content:

Why is an index on (org_id, state, created_at DESC) a strong choice for a query that filters by org_id and state and orders by created_at DESC with LIMIT 50?

You are right! Congratulations, now go to the next page

You missed! Try again.

With (org_id, state, created_at DESC), the WHERE equalities match the leftmost index columns, narrowing to a slice. Within that slice, rows are already ordered by created_at DESC, so the engine can stream results without sorting and may stop after 50 rows.

Next chapter

Indexing 101: Covering Indexes, Included Columns, and Fewer Lookups

Arrow Right Icon
Free Ebook cover Indexing 101: How Databases Find Data Fast
44%

Indexing 101: How Databases Find Data Fast

New course

9 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.