Data Modeling for BI: Why Analytical Models Differ from Transactional Systems

Capítulo 1

Estimated reading time: 8 minutes

+ Exercise

Why BI Data Modeling Exists

BI data modeling is the practice of shaping data so that many people can answer business questions quickly and consistently. The goal is not to capture every operational detail; it is to make analysis reliable and fast.

  • Consistent metrics: “Revenue”, “orders”, “active customers”, and “conversion rate” should mean the same thing in every report. A BI model centralizes definitions and reduces metric drift across teams.
  • Fast reporting: Analytical queries often scan large time ranges and aggregate millions of rows. BI models are designed to minimize joins, reduce ambiguity, and support efficient filtering and grouping.
  • Easier cross-team analysis: Marketing, finance, and operations can slice the same facts by shared dimensions (date, product, customer, channel) without each team rebuilding its own dataset.

Transactional (OLTP) vs Analytical (OLAP/Warehouse): What Changes and Why

How data is written vs how data is read

  • OLTP systems are write-optimized: They handle many small inserts/updates (create order, update payment status, adjust inventory) with strict consistency and concurrency.
  • Analytical systems are read-optimized: They handle fewer, heavier queries that scan and aggregate large datasets (sales by week by category, margin by region, cohort retention).

Typical query patterns

  • OLTP query pattern: “Fetch one order by ID”, “List items in a cart”, “Update shipment status”. These queries touch few rows and rely on indexes and normalized relations.
  • OLAP query pattern: “Total net sales by month and channel”, “Top 20 products by profit in Q4”, “Conversion rate by campaign and device”. These queries group, filter, and aggregate across many rows and often across long time windows.

Normalization vs denormalization

  • OLTP uses normalization: Data is split into many related tables to avoid duplication and to keep updates safe (change a customer email once, not in 10 places).
  • OLAP often uses denormalization (or controlled redundancy): Attributes needed for analysis are organized to reduce join complexity and speed up aggregations. The model favors clarity and query performance over update efficiency.

Impact on reporting performance

  • OLTP reporting risks: Complex joins across many normalized tables, plus filtering by time and aggregating, can be slow and can contend with operational workloads.
  • Warehouse/BI benefits: Pre-shaped tables, stable keys, and predictable join paths make dashboards faster and more consistent. Heavy reporting is isolated from operational transactions.

Concrete Business Example: E-commerce Checkout vs Sales Performance Dashboard

Consider the same business process: a customer places an order, pays, and the order ships. The operational system must support real-time updates and correctness. The BI system must support fast analysis across time, products, customers, and channels.

Operational goal (checkout system)

The checkout system needs to:

  • Create orders and order items quickly
  • Update payment and shipment statuses safely
  • Prevent inconsistent states (e.g., paid order without payment record)
  • Support customer service lookups (find an order, refund it)

Analytical goal (sales performance dashboard)

The dashboard needs to:

  • Aggregate sales and margin by day/week/month
  • Slice results by product category, brand, channel, region, customer segment
  • Compare periods (WoW, YoY), track KPIs, and drill down
  • Run quickly and consistently for many users

How the Same Process Becomes Different Table Designs

OLTP-style design (normalized, process-oriented)

Below is a simplified OLTP design. It is optimized for correctness and updates, not for large aggregations.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

customers(customer_id, email, name, created_at, ...)
addresses(address_id, customer_id, line1, city, region, country, ...)
products(product_id, sku, name, category_id, brand_id, ...)
categories(category_id, category_name)
orders(order_id, customer_id, order_ts, status, billing_address_id, shipping_address_id, ...)
order_items(order_item_id, order_id, product_id, quantity, unit_price, discount_amount, tax_amount)
payments(payment_id, order_id, payment_ts, provider, status, amount)
shipments(shipment_id, order_id, shipped_ts, carrier, status)
refunds(refund_id, order_id, refund_ts, amount, reason)

Typical OLTP queries:

  • Get a single order: join orders to order_items, maybe payments, maybe shipments.
  • Update payment status: update one row in payments and possibly orders.
  • Check inventory or validate product: lookup by product_id or sku.

What happens if you try to build a dashboard directly on this model?

  • “Net sales by month and channel” may require joining orders, order_items, products, categories, payments, and possibly refunds, then filtering by time and statuses, then aggregating.
  • Definitions become scattered: is revenue based on order time or payment time? Are cancelled orders excluded? How are refunds applied?
  • Performance can degrade as data grows and as more users run heavy queries.

BI/warehouse-style design (analysis-oriented)

In BI, you typically separate:

  • Facts: measurable events (sales amount, quantity, discount, tax)
  • Dimensions: descriptive context (date, product, customer, channel, geography)

For the sales dashboard, a common approach is to model a sales event at the grain of “one order line” (one product on one order). That becomes a fact table, with dimensions providing the slicing attributes.

dim_date(date_key, calendar_date, day, week, month, quarter, year)
dim_product(product_key, sku, product_name, category_name, brand_name, ...)
dim_customer(customer_key, customer_id, customer_segment, signup_date, ...)
dim_channel(channel_key, channel_name, campaign, device_type, ...)
dim_geography(geo_key, country, region, city)

fact_sales(order_line_id, date_key, product_key, customer_key, channel_key, geo_key,
           order_id, quantity, gross_sales_amount, discount_amount, tax_amount,
           net_sales_amount, cost_amount, profit_amount)

Notice what changed:

  • Attributes like category_name and brand_name are placed where analysts need them (often in dim_product), rather than requiring multiple joins to lookup tables.
  • Measures like net_sales_amount and profit_amount are stored (or can be computed consistently during transformation) so every report uses the same logic.
  • Time analysis becomes straightforward with dim_date (month, quarter, year are readily available).

Step-by-Step: Translating Checkout Data into a BI Model

Step 1: Choose the analytical grain

Decide what one row in your main analytical table represents. For a sales performance dashboard, common grains include:

  • Order line grain: one row per product per order (best for product/category analysis)
  • Order grain: one row per order (simpler, but less detail for product mix)
  • Payment grain: one row per payment transaction (best for payment analytics)

Example choice: order line grain for fact_sales.

Step 2: Identify the measures (facts)

List numeric values you want to aggregate and compare:

  • Quantity
  • Gross sales (unit_price * quantity)
  • Discount amount
  • Tax amount
  • Net sales (gross - discount + tax, depending on your definition)
  • Cost and profit (if available)

Define them once in the transformation layer so every downstream report uses the same calculation.

Step 3: Identify the slicing axes (dimensions)

List how the business wants to break down results:

  • Date (order date, payment date, ship date)
  • Product (SKU, category, brand)
  • Customer (segment, cohort, region)
  • Channel (web, app, marketplace; campaign; device)
  • Geography (shipping country/region/city)

Important: choose which timestamp drives the primary sales view (e.g., order date vs payment date). If multiple are needed, you can model multiple date keys (e.g., order_date_key, payment_date_key) depending on reporting needs.

Step 4: Map OLTP tables to the BI fact table

Using the OLTP example:

  • orders provides order-level context (order timestamp, customer, addresses, status)
  • order_items provides line-level measures (quantity, unit price, discounts, tax)
  • products provides product identifiers to link to dim_product
  • customers provides identifiers to link to dim_customer

At load time, you create one row in fact_sales per order_items row, attach the appropriate dimension keys, and compute standardized measures.

Step 5: Shape dimensions for usability

Dimensions should be designed for filtering and grouping without extra joins. For example, instead of forcing analysts to join products to categories to get category names, you can include category_name directly in dim_product.

dim_product(product_key, sku, product_name, category_name, brand_name)

This is a deliberate tradeoff: you may duplicate category names across many products, but reporting becomes simpler and faster.

Step 6: Validate with real dashboard questions

Test the BI model against common questions:

  • “Net sales by month and channel” should require joining fact_sales to dim_date and dim_channel, then aggregating.
  • “Top categories by profit last quarter” should require joining fact_sales to dim_product and dim_date.
  • “Repeat purchase rate by customer segment” should rely on consistent customer identifiers and a stable definition of “purchase”.

Side-by-Side: Same Question, Different Workloads

Question: Net sales by month and product category

In OLTP, you typically need multiple joins and careful filtering:

  • Join orders to order_items
  • Join to products then to categories
  • Filter out cancelled orders, handle refunds, choose the correct timestamp
  • Group by month and category

In BI, you aim for a predictable, reusable pattern:

  • Join fact_sales to dim_date and dim_product
  • Group by dim_date.month and dim_product.category_name
  • Sum net_sales_amount

The BI model reduces both the technical cost (fewer joins, clearer keys) and the organizational cost (shared definitions, fewer conflicting numbers).

Now answer the exercise about the content:

When designing a BI model for a sales performance dashboard, which approach best supports fast, consistent reporting across many users?

You are right! Congratulations, now go to the next page

You missed! Try again.

A BI model is read-optimized: it uses facts and dimensions, a clear grain, and centralized definitions (e.g., net sales) to make queries predictable, fast, and consistent across teams.

Next chapter

Core Concepts of Dimensional Modeling: Facts, Dimensions, and Business Processes

Arrow Right Icon
Free Ebook cover Data Modeling Fundamentals for BI: Star Schemas, Dimensions, and Facts
9%

Data Modeling Fundamentals for BI: Star Schemas, Dimensions, and Facts

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.