All courses > Technology and Programming > Databases ::

Indexing 101: How Indexes Affect Writes, Storage, and Concurrency

Capítulo 6

Estimated reading time: 9 minutes

What an Index Maintenance Operation Really Means

Every index is an additional data structure that must stay consistent with the table. Reads benefit because the index provides fast navigation, but writes pay the bill because the database must update the table data and also update every affected index. The cost is not just “extra work”; it includes extra page reads/writes, extra logging, extra contention, and sometimes structural changes inside the index (like page splits).

What Happens on INSERT, UPDATE, and DELETE

INSERT: Add a Row, Add Index Entries

When you insert a row into a table, the database must also insert an entry into each index that includes any column from the new row (which is typically all indexes on the table). Conceptually, each index insert is: find the correct leaf location for the key, place the new key (and row locator) into the leaf page, and ensure the B-tree remains valid.

Step-by-step, an INSERT typically triggers:

Write the table row (heap or clustered structure) and generate log records for durability.
For each secondary index: locate the target leaf page for the new key (may require reading index pages into memory).
Insert the key into the leaf page. If there is space, it’s a simple in-page insert; if not, it can trigger a page split (covered later).
Update parent pages if needed (for example, after a split) and log those structural changes.

Even if the table insert is cheap, the total cost is roughly “table work + (index work × number of indexes).”

UPDATE: Modify Entries, and Sometimes Move Them

Updates vary widely in cost depending on whether indexed columns change.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Updating only non-indexed columns: the database updates the row, but most indexes do not change. This is usually the cheapest update pattern.
Updating an indexed column (part of an index key): the database must update the corresponding index entry. In many engines, changing an index key is effectively a DELETE old key + INSERT new key in that index, because the entry must be in a different sorted position.
Updating included/non-key columns in an index (where supported): the key position may not change, but the leaf payload changes, still requiring index page modifications and logging.

Step-by-step for an UPDATE that changes an indexed key:

Find the row (via a prior lookup, or via an index used by the update statement).
Update the table row and log the change.
For each affected index: remove the old entry (old key) and insert the new entry (new key).
Handle structural changes if the insert lands on a full page (page split) or if deletes create space patterns that lead to fragmentation.

DELETE: Remove Index Entries Too

Deleting a row requires removing its entries from every index. Depending on the engine, deletes may be immediate physical removal or may be marked and cleaned up later, but the index still needs to stop returning the row to queries.

Step-by-step, a DELETE typically triggers:

Mark/remove the table row and log the change.
For each index: locate the leaf entry and remove/mark it.
Potential cleanup work: empty pages may remain, and the tree may not shrink immediately, leaving internal fragmentation.

Why More Indexes Increase Write Latency and Storage

Write Latency: Multiplying Work Per Row

Each additional index adds more operations per write: more page searches, more page modifications, more log records, and more chances of page splits. This increases latency and reduces throughput, especially under concurrency.

A practical way to reason about it:

INSERT: 1 table insert + N index inserts
DELETE: 1 table delete + N index deletes
UPDATE: 1 table update + (indexes touched by changed columns) × (delete+insert or in-place update)

If a table has 8 secondary indexes and you insert 10,000 rows, you are not doing “10,000 operations”; you are doing “10,000 table writes + 80,000 index entry writes,” plus any structural maintenance.

Storage: Indexes Are Additional Copies of Information

Indexes consume disk and memory because they store:

Index keys (the indexed columns, often duplicated from the table).
Row locators (primary key value, row id, or pointer to the base row depending on engine).
Structural overhead (internal pages, sibling pointers, fill factor slack space).
Optional payload (included columns / stored columns in the leaf level where supported).

More indexes also increase cache pressure: index pages compete with table pages and other indexes for memory. Under write-heavy workloads, this can lead to more page reads (cache misses) and more I/O.

Page Splits and Fragmentation in B-trees

What a Page Split Is

B-tree indexes store sorted keys in fixed-size pages. When an insert needs to place a new key into a leaf page that is already full, the engine must create space. A common approach is a page split:

Allocate a new page.
Move roughly half the entries from the full page to the new page (or move a portion depending on implementation).
Insert the new key into the appropriate page.
Update parent pages to point to the new page and maintain the key boundaries.
Log all of the above.

Page splits are expensive because they turn a single insert into multiple page writes and additional parent updates. They also tend to create fragmentation.

Fragmentation: Logical Order vs Physical Layout

Even though the B-tree maintains logical order (keys are still sorted), the physical pages on disk may no longer be in a neat sequence. After many splits, adjacent key ranges may live on pages far apart on disk, increasing I/O for range scans and increasing cache churn.

Fragmentation can show up as:

More pages than expected for the same number of rows (lower page density).
More random I/O for range queries because pages are less contiguous.
Higher write amplification because splits and reorganizations write more pages.

Why Random Inserts Are Harder Than Append-Friendly Inserts

Insert pattern matters because it determines where in the B-tree the new keys land.

Append-friendly pattern (monotonically increasing keys like an ever-increasing timestamp or sequence): most inserts go to the “rightmost” leaf page. The engine repeatedly inserts near the end, often touching a small set of pages. Splits still happen, but they tend to occur near the end, and the working set is more stable.
Random insert pattern (keys distributed across the entire key space, like random UUIDs): inserts land throughout the tree. Many different leaf pages are touched, increasing cache misses and contention. Splits can occur anywhere, creating widespread fragmentation and more frequent structural changes.

Conceptually: append-friendly inserts concentrate work; random inserts spread work across many pages. Spreading work sounds good until you realize it spreads contention and cache pressure too, and it increases the probability that any given insert hits a full page somewhere in the middle of the tree.

Practical Walkthrough: How a Random Key Causes More Churn

Step 1: Insert key K1; it lands in some leaf page L1.
Step 2: Insert key K2; it lands in a different leaf page L2 far away in the key space.
Step 3: Under load, many sessions insert keys across many leaf pages, forcing the engine to keep many leaf pages hot in memory.
Step 4: When any of those leaf pages fills, a split occurs, requiring allocation, movement of entries, and parent updates—each of which must be synchronized and logged.

With append-friendly keys, steps 1–4 mostly repeat on the same “right edge” of the tree, often improving locality and reducing the number of pages competing for cache and latches.

Concurrency Implications: Locks, Latches, and Longer Transactions

More Indexes Mean More Shared Structures to Coordinate

Concurrency costs come from two broad categories of coordination:

Locks (logical protection for transaction isolation): prevent other transactions from reading/writing rows or key ranges in conflicting ways.
Latches (physical protection for in-memory structures): protect index pages and internal data structures while they are being modified.

Maintaining many indexes increases both kinds of coordination because each write must touch more pages and more key ranges.

Longer Transactions and Higher Contention

Even if your application transaction is “small,” the database work inside it grows with each index. That can extend the time locks are held and the number of latch acquisitions required.

Typical concurrency side effects on write-heavy tables:

More lock acquisitions per statement because each index update may lock additional key ranges (depending on isolation level and engine).
More latch contention on hot index pages, especially on append-friendly indexes where many sessions target the same rightmost page, or on random patterns where many pages are modified concurrently.
More deadlock opportunities because multiple indexes can be updated in different orders across concurrent transactions.
Longer commit time due to extra logging and flushing requirements, depending on durability settings.

Page Splits Make Concurrency Worse

When a page split occurs, the engine must perform a coordinated structural change: allocate a page, move entries, and update parent pointers. This typically requires stronger or longer-held latches than a simple in-page insert, increasing the chance that other sessions wait.

Practical Guidance for High-Write Tables

Limit Indexes to Those That Pay for Themselves

Start from real queries: create indexes to support the most important read patterns, not hypothetical ones.
Avoid redundant indexes: for example, multiple indexes with the same leading columns but minor variations can multiply write cost.
Be cautious with “nice-to-have” indexes on tables with frequent inserts/updates (event logs, queues, session tables, telemetry).
Prefer fewer, well-chosen composite indexes over many single-column indexes when it matches your workload (but validate against your actual query patterns).

Watch for Symptoms in Write-Heavy Workloads

Rising insert/update latency as index count grows.
Increased log volume and write I/O.
Lock waits and latch waits concentrated around index pages.
Frequent page splits and growing fragmentation indicators (engine-specific metrics).

Operational Checks You Can Do Regularly

Inventory indexes per table and flag high-write tables with “too many” indexes for review.
Track write amplification: compare rows written vs index page writes / log bytes generated (using your database’s monitoring views).
Monitor page split rates and fragmentation metrics; correlate spikes with key patterns and traffic changes.
Review unused/rarely used indexes (based on index usage stats) and remove them carefully after validating with production-like traffic.

Now answer the exercise about the content:

Why can adding more indexes to a write-heavy table increase insert/update latency and reduce throughput?

You are right! Congratulations, now go to the next page

You missed! Try again.

Indexes speed up reads, but each write must maintain the table plus all affected indexes. This adds extra page work, log records, and coordination (locks/latches), and full pages can trigger costly page splits.