Designing Fact Tables: Additivity, Measures, and Common Fact Types

Capítulo 4

Estimated reading time: 10 minutes

+ Exercise

Additivity: How Measures Behave Under Aggregation

Fact tables store measures that are meant to be aggregated in reports. The key design question is: can this measure be summed across a dimension without changing its meaning? This is the idea of additivity. Getting additivity right prevents common reporting errors like double counting, summing percentages, or summing point-in-time balances.

Additive Measures

Additive measures can be summed across all dimensions, including time, and remain meaningful.

  • Revenue: summing revenue across products, stores, customers, and days produces correct totals.
  • Units sold: summing units across any dimension is valid.
  • Cost (at the same grain): summing cost across orders, products, and time is valid.

Practical example (revenue): If your fact table has one row per order line, then SUM(revenue) by day, by product, or by region is safe because each row represents a distinct event at the chosen grain.

Semi-Additive Measures

Semi-additive measures can be summed across some dimensions but not across time. They represent a state at a point in time (a balance or level).

  • Inventory level: you can sum inventory across products or warehouses at a point in time, but you should not sum inventory levels across days (that would inflate the meaning).
  • Account balance: additive across accounts at a date, not additive across dates.

Practical example (inventory level): Inventory on Monday is 100 units and on Tuesday is 120 units. Summing to 220 does not represent anything useful. Instead, typical time aggregations are LAST_VALUE (ending balance), AVG (average level), MIN/MAX (range), depending on the business question.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

Non-Additive Measures

Non-additive measures should not be summed across most dimensions because they are ratios, rates, or derived metrics.

  • Conversion rate: conversions / visits. Summing daily conversion rates is incorrect.
  • Margin %: margin / revenue. Summing margin percentages is incorrect.

Practical example (conversion rate): Day 1: 10 conversions / 100 visits = 10%. Day 2: 1 conversion / 2 visits = 50%. Summing rates gives 60% (nonsense). The correct overall rate is (10+1)/(100+2) = 10.78%.

Design rule: store the additive components (numerator and denominator) as measures, and compute the ratio in the semantic layer or query.

-- Safe pattern: store additive components, compute rate at query time (or in BI layer) SELECT   d.date,   SUM(f.conversions) AS conversions,   SUM(f.visits) AS visits,   SUM(f.conversions) * 1.0 / NULLIF(SUM(f.visits), 0) AS conversion_rate FROM fact_marketing_events f JOIN dim_date d ON f.date_key = d.date_key GROUP BY d.date;

Common Fact Table Patterns

Different business questions require different fact table patterns. The pattern determines what each row represents, which keys are required, and which measures aggregate safely.

1) Transaction Fact Tables (Event Facts)

What it represents: a discrete business event (e.g., an order line, a shipment line, a payment, a click). Each row is an event that happened at a specific time.

When to use: sales transactions, returns, shipments, web events, call center interactions.

Recommended structure (Sales Transactions)

Required keys (typical):

  • Date key (and optionally time key) for when the event occurred
  • Product key
  • Customer key
  • Store/Channel key
  • Order/Invoice degenerate identifier (stored in the fact as a text/number identifier, not as a dimension table, when it has no attributes)

Typical measures:

  • Units sold (additive)
  • Extended revenue (additive)
  • Discount amount (additive)
  • Tax amount (additive)
  • Extended cost (additive)

Safe aggregation guidance:

  • SUM additive measures across any dimension.
  • Use COUNT(*) for number of lines; use COUNT(DISTINCT order_id) for number of orders.
  • Avoid mixing grains: if you store one row per order line, do not join to another fact at order grain without careful bridging or pre-aggregation.
-- Revenue by month and product category (safe SUM) SELECT   d.month,   p.category,   SUM(f.revenue) AS revenue,   SUM(f.units) AS units FROM fact_sales_transaction f JOIN dim_date d ON f.date_key = d.date_key JOIN dim_product p ON f.product_key = p.product_key GROUP BY d.month, p.category;

Step-by-step: selecting the transaction pattern for sales

  • Step 1: Identify the event: e.g., order line booked (or shipped, depending on the metric definition).
  • Step 2: Ensure each row is uniquely identifiable: include the operational line identifier or a composite key (order_id + line_number) as a degenerate dimension.
  • Step 3: Store additive measures: revenue, units, discount, cost.
  • Step 4: Add foreign keys for slicing: date, product, customer, channel, sales rep, promotion (as applicable).
  • Step 5: Validate aggregation: totals by day and by order should match the source system’s reconciled totals for the same event definition.

2) Snapshot Fact Tables (Periodic Snapshot Facts)

What it represents: the state of something at a regular interval (daily, weekly, month-end). Each row is a measurement of a level at a point in time.

When to use: inventory on hand, account balances, open tickets at end of day, headcount by department at month-end.

Recommended structure (Operations Inventory Snapshots)

Required keys (typical):

  • Snapshot date key (the “as of” date)
  • Product key
  • Warehouse/Location key
  • Optionally: lot/batch key, supplier key, inventory status key (available, reserved, damaged)

Typical measures:

  • On-hand quantity (semi-additive across time)
  • On-hand value (semi-additive across time)
  • Allocated quantity (semi-additive across time)
  • Available quantity (semi-additive across time)

Safe aggregation guidance:

  • Across product or warehouse for the same snapshot date: SUM is safe.
  • Across time: do not sum levels. Use LAST_VALUE for ending inventory, AVG for average inventory, or compute deltas using two dates.
-- Ending inventory by month (use last snapshot in month) WITH month_end AS (   SELECT     d.month,     MAX(d.date_key) AS month_end_date_key   FROM dim_date d   GROUP BY d.month ) SELECT   me.month,   SUM(f.on_hand_qty) AS ending_on_hand_qty FROM month_end me JOIN fact_inventory_snapshot f   ON f.snapshot_date_key = me.month_end_date_key GROUP BY me.month;
-- Average daily inventory in a month (use AVG of daily totals) SELECT   d.month,   AVG(daily_total_on_hand) AS avg_daily_on_hand FROM (   SELECT     d.month,     d.date_key,     SUM(f.on_hand_qty) AS daily_total_on_hand   FROM fact_inventory_snapshot f   JOIN dim_date d ON f.snapshot_date_key = d.date_key   GROUP BY d.month, d.date_key ) x GROUP BY x.month;

Step-by-step: selecting the snapshot pattern for inventory

  • Step 1: Choose the snapshot interval: daily is common for inventory; weekly/monthly may miss volatility.
  • Step 2: Define the “as of” moment: end-of-day, start-of-day, or a specific cut-off time.
  • Step 3: Decide the dimensionality: product + location is typical; add status if you need separate reporting for available vs reserved.
  • Step 4: Store level measures: quantities and values as of the snapshot moment.
  • Step 5: Provide safe time aggregations: document standard metrics (ending, average, min/max) and implement them consistently in BI.

3) Accumulating Snapshot Fact Tables (Process Facts)

What it represents: the lifecycle of a process that has well-defined milestones. Each row represents one process instance (e.g., one lead, one application, one claim), and columns capture milestone dates and durations as the instance progresses.

When to use: marketing and sales funnels, order-to-cash, claims processing, onboarding, ticket resolution.

Key behavior: rows are updated as milestones occur (unlike transaction facts, which are insert-only). This pattern is designed for cycle time, stage conversion, and drop-off analysis.

Recommended structure (Marketing Funnel Accumulating Snapshot)

Required keys (typical):

  • Process instance identifier (degenerate ID such as lead_id)
  • Current status key (optional but useful for “where are things now?”)
  • Milestone date keys: created_date_key, qualified_date_key, demo_date_key, proposal_date_key, won_date_key, lost_date_key (choose milestones that exist in your process)
  • Dimensional foreign keys: campaign key, channel key, product key (if applicable), sales rep key, region key

Typical measures:

  • Stage duration measures (additive across instances): days_to_qualify, days_to_demo, days_to_close
  • Flags (0/1) for milestone reached: reached_qualified_flag, reached_won_flag (additive across instances)
  • Pipeline amount at creation or at qualification (be careful: if it changes over time, treat it as a separate snapshot or store multiple amounts with clear definitions)

Safe aggregation guidance:

  • Counts of instances: COUNT(*) at the process grain (one row per lead/opportunity).
  • Milestone completion counts: SUM(reached_won_flag), SUM(reached_demo_flag).
  • Conversion rates: compute as ratios of additive counts (e.g., won / qualified).
  • Cycle time: AVG(days_to_close) or percentile functions; summing durations is rarely meaningful unless you explicitly want total effort time.
-- Funnel conversion rate from qualified to won (safe ratio of additive counts) SELECT   c.channel,   SUM(f.reached_qualified_flag) AS qualified,   SUM(f.reached_won_flag) AS won,   SUM(f.reached_won_flag) * 1.0 / NULLIF(SUM(f.reached_qualified_flag), 0) AS qualified_to_won_rate FROM fact_funnel_accumulating f JOIN dim_channel c ON f.channel_key = c.channel_key GROUP BY c.channel;
-- Average days to close for won opportunities by month of creation SELECT   d.month,   AVG(f.days_to_close) AS avg_days_to_close FROM fact_funnel_accumulating f JOIN dim_date d ON f.created_date_key = d.date_key WHERE f.reached_won_flag = 1 GROUP BY d.month;

Step-by-step: selecting the accumulating snapshot pattern for a funnel

  • Step 1: Identify the process instance: lead, opportunity, application, claim.
  • Step 2: List stable milestones: only include stages that are meaningful and consistently captured.
  • Step 3: Add one date key per milestone: store NULL until reached; update when reached.
  • Step 4: Add additive flags and durations: flags enable safe counting; durations enable cycle-time analysis.
  • Step 5: Define rate calculations: always compute rates from summed counts, not from averaging per-row rates.

Choosing the Right Pattern: Practical Guidance by Use Case

Sales: Use Transaction Facts for Orders/Invoices

  • Best fit: transaction fact table because sales are discrete events and most measures are additive.
  • Required keys: date/time, product, customer, channel/store; include order_id and line_id as degenerate identifiers.
  • Measures: revenue, units, discount, cost, tax.
  • Safe reporting: sum revenue/units; count distinct orders for order volume; compute average order value as SUM(revenue)/COUNT(DISTINCT order_id).
-- Average order value (AOV) computed safely from additive revenue and distinct orders SELECT   d.month,   SUM(f.revenue) * 1.0 / NULLIF(COUNT(DISTINCT f.order_id), 0) AS avg_order_value FROM fact_sales_transaction f JOIN dim_date d ON f.date_key = d.date_key GROUP BY d.month;

Marketing Funnels: Use Accumulating Snapshots for Milestones

  • Best fit: accumulating snapshot because stakeholders ask “how many reached each stage?”, “what is the conversion between stages?”, and “how long does it take?”.
  • Required keys: lead/opportunity ID, milestone date keys, campaign/channel, owner (sales rep), and optionally current status.
  • Measures: milestone flags, stage durations, and carefully defined monetary amounts.
  • Safe reporting: compute conversion rates from summed flags; compute cycle time with averages/percentiles on duration measures.

Operations Inventory: Use Periodic Snapshots for Levels

  • Best fit: snapshot fact because inventory is a level that changes over time and is evaluated “as of” a date.
  • Required keys: snapshot date, product, location, and optionally status.
  • Measures: on-hand qty/value, allocated qty, available qty.
  • Safe reporting: sum across products/locations for a date; use ending/average/min/max across time, not sum of levels.

Design Checklist: Making Measures Safe to Aggregate

Store base additive components for ratios

  • For conversion rate, store visits and conversions, not just the rate.
  • For margin %, store revenue and cost (or margin amount), then compute the percentage.

Document the intended aggregation for semi-additive measures

  • Inventory: ending balance (last), average balance (avg), peak (max), low (min).
  • Implement standard queries or semantic measures so users do not accidentally sum levels across time.

Use flags for milestone counting

  • In accumulating snapshots, a 0/1 flag per milestone is often the simplest additive measure for stage counts.
  • Compute stage-to-stage rates from summed flags to avoid averaging per-row rates.

Be explicit about distinct counting needs

  • Transaction facts often require COUNT(DISTINCT ...) for business entities (orders, customers) because the fact grain may be more detailed (order lines).
  • Keep the entity identifier in the fact (degenerate dimension) to enable correct distinct counts.

Now answer the exercise about the content:

You need to report monthly conversion rate from daily marketing event data. Which approach avoids incorrect aggregation of non-additive measures?

You are right! Congratulations, now go to the next page

You missed! Try again.

Conversion rate is non-additive, so summing or averaging daily rates can be misleading. Store and aggregate additive components (conversions, visits) and compute the ratio from their sums for the correct overall rate.

Next chapter

Designing Dimension Tables: Attributes, Hierarchies, and Usable Business Context

Arrow Right Icon
Free Ebook cover Data Modeling Fundamentals for BI: Star Schemas, Dimensions, and Facts
36%

Data Modeling Fundamentals for BI: Star Schemas, Dimensions, and Facts

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.