All courses > Technology and Programming > Databases ::

Partitioning Strategies for High-Volume Event Data

Capítulo 6

Estimated reading time: 14 minutes

Why partition high-volume event tables

High-volume event data stresses PostgreSQL in predictable places: index bloat, vacuum pressure, long-running queries that scan too much history, and operational tasks (retention, backfills, reindexing) that become risky on a single giant table. Partitioning is a physical design technique that splits one logical table into many smaller child tables (partitions) while preserving a single SQL interface. Done well, partitioning improves query performance via partition pruning, reduces maintenance blast radius, and makes retention and archiving operationally simple.

Partitioning is not a substitute for good indexing or query design. It is most effective when your access patterns align with a partition key (commonly time) and when you have clear lifecycle rules (e.g., keep 90 days hot, archive older). It also introduces complexity: more objects to manage, constraints on unique keys, and the need for automation. This chapter focuses on practical partitioning strategies for event tables that grow quickly and are queried by time windows and/or tenant boundaries.

Core concepts: partitioned tables, pruning, and maintenance

Partitioned table vs. partitions

In PostgreSQL declarative partitioning, you create a parent table marked PARTITION BY. The parent holds the schema and acts as the query entry point. Each partition is a normal table with a partition constraint. Inserts into the parent are routed to the correct partition based on the partition key.

Partition pruning

Partition pruning is the optimizer’s ability to skip partitions that cannot match a query’s predicates. For event data, pruning is the main performance win: a query for “last 24 hours” should touch only the partitions that cover that time range. Pruning works best when:

The query predicates reference the partition key directly (e.g., occurred_at >= ... AND occurred_at < ...).
The predicates are sargable (avoid wrapping the partition key in functions like date_trunc(occurred_at) in the WHERE clause).
Statistics are up to date for partitions (autovacuum/analyze must keep up).

Maintenance isolation

Because each partition is a separate table, maintenance tasks can be targeted: vacuuming, reindexing, clustering, and even schema changes can be staged partition-by-partition. Retention becomes a metadata operation: dropping an old partition is far faster than deleting billions of rows.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Choosing a partition key for event data

Time-based partitioning (most common)

Event data is naturally time-oriented. Time-based partitioning (by day, week, or month) aligns with typical queries (time windows), retention policies, and backfills. It also supports “hot vs. cold” storage patterns: recent partitions are heavily written and queried; older partitions are mostly read or archived.

Use a timestamp column that represents the event’s logical time (often occurred_at). If you also store ingestion time (e.g., ingested_at), be explicit about which one drives partitioning. Partitioning by ingestion time can simplify routing but may hurt analytical correctness if queries are based on occurred time. Partitioning by occurred time improves pruning for business queries but requires careful handling of late-arriving events.

Tenant-based partitioning (when multi-tenant scale dominates)

If you have a small number of very large tenants, partitioning by tenant (LIST) can isolate hotspots and make per-tenant maintenance easier. However, if you have many tenants (thousands+), LIST partitions become unwieldy. Hash partitioning by tenant can distribute load but does not help retention and is less intuitive for time-window queries.

Composite strategies (subpartitioning)

PostgreSQL supports multi-level partitioning: for example, RANGE by time, then HASH by tenant within each time partition. This can help when you need both time pruning and write distribution. It increases object count and operational complexity, so adopt it only when a single dimension is insufficient.

Picking a partition granularity: day vs. week vs. month

Granularity is a trade-off between pruning precision and object overhead.

Daily: best pruning for short time windows and high write rates; more partitions to manage (365/year). Good for very high-volume event streams and strict retention (e.g., keep 30–90 days).
Weekly: fewer objects (52/year) and still decent pruning. Good when queries often span multiple days and volume is moderate.
Monthly: minimal object overhead (12/year) but larger partitions; pruning is coarse and maintenance operations affect more data at once. Good for lower volume or long retention where queries commonly span weeks/months.

As a rule of thumb, aim for partitions that are “comfortably maintainable” in size. Many teams target partitions in the range of a few GB to a few tens of GB, but the right size depends on hardware, vacuum capacity, and query patterns.

Step-by-step: create a time-partitioned event table

The following example uses RANGE partitioning by occurred_at with daily partitions. Adjust types and columns to match your schema; the focus here is partitioning mechanics and operational patterns.

1) Create the partitioned parent table

CREATE TABLE events (  event_id        uuid        NOT NULL,  occurred_at     timestamptz NOT NULL,  tenant_id       bigint      NOT NULL,  event_type      text        NOT NULL,  payload         jsonb       NOT NULL,  -- optional metadata  ingested_at     timestamptz NOT NULL DEFAULT now(),  source          text,  PRIMARY KEY (occurred_at, event_id)) PARTITION BY RANGE (occurred_at);

Note the primary key includes the partition key (occurred_at). PostgreSQL requires that a unique/primary key on a partitioned table include all partition key columns, because uniqueness must be enforceable across partitions. If you need a globally unique event_id constraint independent of time, you typically enforce it via application logic, a separate registry table, or accept that the database cannot enforce it as a single unique constraint on the partitioned table.

2) Create partitions for a time window

Create partitions ahead of time to avoid insert failures. A common practice is to create partitions for the next N days and keep a small buffer for late events (or route late events to a dedicated partition).

CREATE TABLE events_2026_01_10 PARTITION OF events  FOR VALUES FROM ('2026-01-10') TO ('2026-01-11');CREATE TABLE events_2026_01_11 PARTITION OF events  FOR VALUES FROM ('2026-01-11') TO ('2026-01-12');

3) Add indexes on partitions (or use partitioned indexes)

For event queries, you often filter by tenant, event_type, and time. With partitioning, you still need indexes inside partitions to avoid scanning all rows within a partition.

You can create a partitioned index on the parent, which creates corresponding indexes on each partition (and automatically applies to future partitions in newer PostgreSQL versions when created on the parent).

CREATE INDEX events_tenant_time_idx ON events (tenant_id, occurred_at DESC);CREATE INDEX events_type_time_idx ON events (event_type, occurred_at DESC);CREATE INDEX events_payload_gin_idx ON events USING GIN (payload);

Be selective with GIN indexes on jsonb for high-volume event tables; they can be expensive to maintain. Prefer targeted expression indexes when you know which keys are queried frequently (e.g., (payload->>'order_id')), and keep them only on hot partitions if possible.

4) Verify partition pruning with EXPLAIN

Pruning should be visible in query plans. Use a time-bounded predicate that matches the partition key.

EXPLAIN (ANALYZE, BUFFERS)SELECT tenant_id, count(*)FROM eventsWHERE occurred_at >= now() - interval '1 day'  AND occurred_at <  now()GROUP BY tenant_id;

In the plan, look for “Append” or “Bitmap Heap Scan” nodes that reference only the relevant partitions. If you see many partitions scanned, check whether the predicate is sargable and whether the timestamps are comparable without function wrapping.

Handling late-arriving events

Event systems often deliver late data. If you partition by occurred_at, a late event may belong to an old partition that you have already dropped or archived, or it may force writes into partitions that are otherwise cold (hurting cache locality and maintenance predictability).

Common strategies:

Keep a late-arrival buffer: retain partitions for an extra grace period beyond your business retention (e.g., keep 7 additional days) before archiving/dropping.
Use a “catch-all” partition: create a default partition to accept out-of-range rows, then periodically re-home them into correct partitions. This avoids insert failures but requires a cleanup job.
Partition by ingestion time when operational simplicity matters more than occurred-time pruning. You can still query by occurred time, but pruning will be less effective unless queries also constrain ingestion time.

If you choose a default partition, implement monitoring: a growing default partition is a signal that partition creation is lagging or that upstream timestamps are incorrect.

Retention and archiving: drop vs. detach vs. move

Partitioning shines for retention. Instead of deleting rows (which generates WAL, bloats indexes, and triggers vacuum), you can remove whole partitions.

Drop old partitions (fastest)

If you do not need the data, drop the partition.

DROP TABLE events_2025_10_01;

Detach partitions for archiving

If you need to keep data but not in the hot database, detach the partition (it becomes a standalone table), then dump it, copy it to cheaper storage, or load it into an archive database.

ALTER TABLE events DETACH PARTITION events_2025_10_01;

After detaching, you can pg_dump that table or move it to a different tablespace. Detaching is also useful for “freezing” a partition: you can set it read-only at the application layer and reduce maintenance.

Move cold partitions to a different tablespace

If you use multiple disks, you can place older partitions on slower/cheaper storage.

ALTER TABLE events_2025_10_01 SET TABLESPACE cold_ts;ALTER INDEX events_2025_10_01_tenant_time_idx SET TABLESPACE cold_ts;

This approach keeps data queryable in the same database while controlling storage costs and IO patterns.

Operational automation: creating partitions safely

High-volume systems should not rely on manual partition creation. Automate partition lifecycle with a scheduled job (cron, pg_cron, or an external orchestrator). The job should:

Create partitions ahead of time (e.g., next 7–30 days).
Ensure required indexes exist on new partitions.
Optionally apply storage parameters (fillfactor), tablespace placement, and autovacuum tuning.
Drop/detach partitions past retention.
Alert if inserts are hitting a default partition or failing due to missing partitions.

Example: a simple partition creation function

CREATE OR REPLACE FUNCTION ensure_daily_event_partition(p_day date)RETURNS voidLANGUAGE plpgsqlAS $$DECLARE  start_ts timestamptz := p_day::timestamptz;  end_ts   timestamptz := (p_day + 1)::timestamptz;  part_name text := format('events_%s', to_char(p_day, 'YYYY_MM_DD'));BEGIN  EXECUTE format(    'CREATE TABLE IF NOT EXISTS %I PARTITION OF events FOR VALUES FROM (%L) TO (%L)',    part_name, start_ts, end_ts  );  -- Create indexes if they do not exist (per-partition naming shown)  EXECUTE format('CREATE INDEX IF NOT EXISTS %I ON %I (tenant_id, occurred_at DESC)',    part_name || '_tenant_time_idx', part_name);  EXECUTE format('CREATE INDEX IF NOT EXISTS %I ON %I (event_type, occurred_at DESC)',    part_name || '_type_time_idx', part_name);END;$$;

Run it in a loop for a date range. In production, also encode tablespace decisions and autovacuum settings, and handle permissions carefully.

Indexing patterns with partitions

Local indexes and query shapes

Partitioning reduces the number of rows scanned by pruning, but within a partition you still need efficient access paths. For event queries, common patterns include:

Time window + tenant: index (tenant_id, occurred_at DESC) supports “latest events for tenant” and bounded scans.
Time window + type: index (event_type, occurred_at DESC) supports per-type analytics and filtering.
Time window + tenant + type: if this is dominant, consider (tenant_id, event_type, occurred_at DESC) to reduce filtering.

Be cautious about creating too many indexes: each insert must update them, and high-volume event ingestion can become index-bound. Prefer a small set of indexes that match your most important queries, and consider keeping fewer indexes on older partitions if they are rarely queried.

Partial indexes on hot partitions

Because partitions are already time-scoped, partial indexes are often less necessary. But you can still use them to focus on “high-value” subsets (e.g., only a few event types) within hot partitions. Another approach is to maintain richer indexes only on recent partitions and minimal indexes on cold partitions.

Query design for effective pruning

Partition pruning depends on how you write queries. Practical guidelines:

Use half-open intervals: occurred_at >= $start AND occurred_at < $end. This avoids boundary ambiguity and aligns with partition ranges.
Avoid WHERE date(occurred_at) = ...; instead compute boundaries in the application and use timestamp comparisons.
When joining events to other tables, keep the time predicate on the events table in the WHERE clause so the planner can prune early.
For dashboards, parameterize time windows and ensure they map to partition boundaries when possible (e.g., “today”, “last 7 days”).

Dealing with high ingest rates: minimizing contention

Partitioning can reduce contention because inserts spread across partitions over time, but “today’s partition” can still be a hotspot. Practical techniques:

Keep indexes minimal on the active partition to maximize ingest throughput.
Use appropriate fillfactor on hot partitions to reduce page splits for indexes that insert in descending time order. If you index by occurred_at DESC, inserts are near the beginning of the index and can cause contention; consider occurred_at ascending if it matches insert order, or accept the trade-off for “latest-first” queries.
Tune autovacuum per partition: hot partitions may need more aggressive vacuum/analyze settings than cold partitions.

Because each partition is a separate table, you can apply storage parameters per partition:

ALTER TABLE events_2026_01_11 SET (autovacuum_vacuum_scale_factor = 0.02, autovacuum_analyze_scale_factor = 0.01);

Subpartitioning: time RANGE + tenant HASH

If a single daily partition is still too hot (very high ingest, many concurrent writers, heavy per-tenant queries), subpartitioning can help distribute writes and indexes across multiple physical tables while preserving time pruning.

Example structure

CREATE TABLE events (  event_id uuid NOT NULL,  occurred_at timestamptz NOT NULL,  tenant_id bigint NOT NULL,  event_type text NOT NULL,  payload jsonb NOT NULL,  PRIMARY KEY (occurred_at, tenant_id, event_id)) PARTITION BY RANGE (occurred_at);CREATE TABLE events_2026_01_11 PARTITION OF events  FOR VALUES FROM ('2026-01-11') TO ('2026-01-12')  PARTITION BY HASH (tenant_id);CREATE TABLE events_2026_01_11_p0 PARTITION OF events_2026_01_11  FOR VALUES WITH (MODULUS 8, REMAINDER 0);CREATE TABLE events_2026_01_11_p1 PARTITION OF events_2026_01_11  FOR VALUES WITH (MODULUS 8, REMAINDER 1);

With this design, a query constrained by time prunes to the day partition, and then tenant-based filters can prune further (or at least benefit from smaller per-hash partitions). The cost is 8x more partitions per day (or whatever modulus you choose), plus more indexes to maintain.

Migration strategy: converting a large unpartitioned event table

Partitioning is easiest to adopt early, but you can migrate later with careful planning. A typical approach minimizes downtime by building the partitioned structure alongside the existing table and switching writes.

Step-by-step migration outline

Create the new partitioned table (events_new) with the desired partition scheme and indexes.
Backfill historical data in chunks ordered by time to reduce random IO. Use INSERT INTO events_new SELECT ... FROM events_old WHERE occurred_at >= ... AND occurred_at < ... per time slice.
Dual-write or capture changes during backfill. Options include application dual-write, logical replication, or triggers on the old table that mirror inserts into the new table (triggers add overhead; use only temporarily).
Validate counts and constraints per time range and per tenant to ensure completeness.
Cut over: stop writers briefly, apply the final delta, then swap names (ALTER TABLE ... RENAME) or update application references.

During backfill, ensure partitions exist for the time ranges being copied. If you use a default partition, monitor it and re-home rows after the fact.

Common pitfalls and how to avoid them

Too many partitions

Thousands of partitions can increase planning time and catalog overhead. If you query wide time ranges frequently, daily partitions may be excessive. Consider weekly/monthly partitions or use a two-tier approach (daily for recent, monthly for older) by detaching and consolidating older partitions into coarser ones in an archive database.

Partition key mismatch with queries

If most queries filter by tenant but not by time, time partitioning alone may not help. Either adjust queries to include time bounds (often reasonable for event data) or consider tenant-based partitioning/subpartitioning.

Missing partitions causing insert failures

Without a default partition, inserts for out-of-range timestamps fail. This is good for correctness but requires reliable automation. If you cannot guarantee it, use a default partition plus alerting and periodic cleanup.

Global uniqueness expectations

Partitioned tables cannot enforce a unique constraint that does not include the partition key. If you need global uniqueness for an identifier, design for it explicitly (e.g., separate table keyed by event_id, or accept uniqueness per time window).

Over-indexing

Each additional index multiplies write cost across partitions. Start with the minimum set that supports your critical reads, then add selectively based on measured query plans and workload.

Now answer the exercise about the content:

Which query pattern is most likely to enable effective partition pruning on a time-partitioned events table?

You are right! Congratulations, now go to the next page

You missed! Try again.

Partition pruning works best when predicates reference the partition key directly and are sargable. A half-open interval on occurred_at matches RANGE boundaries cleanly and allows the planner to skip irrelevant partitions.