Why BI Data Modeling Exists
BI data modeling is the practice of shaping data so that many people can answer business questions quickly and consistently. The goal is not to capture every operational detail; it is to make analysis reliable and fast.
- Consistent metrics: “Revenue”, “orders”, “active customers”, and “conversion rate” should mean the same thing in every report. A BI model centralizes definitions and reduces metric drift across teams.
- Fast reporting: Analytical queries often scan large time ranges and aggregate millions of rows. BI models are designed to minimize joins, reduce ambiguity, and support efficient filtering and grouping.
- Easier cross-team analysis: Marketing, finance, and operations can slice the same facts by shared dimensions (date, product, customer, channel) without each team rebuilding its own dataset.
Transactional (OLTP) vs Analytical (OLAP/Warehouse): What Changes and Why
How data is written vs how data is read
- OLTP systems are write-optimized: They handle many small inserts/updates (create order, update payment status, adjust inventory) with strict consistency and concurrency.
- Analytical systems are read-optimized: They handle fewer, heavier queries that scan and aggregate large datasets (sales by week by category, margin by region, cohort retention).
Typical query patterns
- OLTP query pattern: “Fetch one order by ID”, “List items in a cart”, “Update shipment status”. These queries touch few rows and rely on indexes and normalized relations.
- OLAP query pattern: “Total net sales by month and channel”, “Top 20 products by profit in Q4”, “Conversion rate by campaign and device”. These queries group, filter, and aggregate across many rows and often across long time windows.
Normalization vs denormalization
- OLTP uses normalization: Data is split into many related tables to avoid duplication and to keep updates safe (change a customer email once, not in 10 places).
- OLAP often uses denormalization (or controlled redundancy): Attributes needed for analysis are organized to reduce join complexity and speed up aggregations. The model favors clarity and query performance over update efficiency.
Impact on reporting performance
- OLTP reporting risks: Complex joins across many normalized tables, plus filtering by time and aggregating, can be slow and can contend with operational workloads.
- Warehouse/BI benefits: Pre-shaped tables, stable keys, and predictable join paths make dashboards faster and more consistent. Heavy reporting is isolated from operational transactions.
Concrete Business Example: E-commerce Checkout vs Sales Performance Dashboard
Consider the same business process: a customer places an order, pays, and the order ships. The operational system must support real-time updates and correctness. The BI system must support fast analysis across time, products, customers, and channels.
Operational goal (checkout system)
The checkout system needs to:
- Create orders and order items quickly
- Update payment and shipment statuses safely
- Prevent inconsistent states (e.g., paid order without payment record)
- Support customer service lookups (find an order, refund it)
Analytical goal (sales performance dashboard)
The dashboard needs to:
- Aggregate sales and margin by day/week/month
- Slice results by product category, brand, channel, region, customer segment
- Compare periods (WoW, YoY), track KPIs, and drill down
- Run quickly and consistently for many users
How the Same Process Becomes Different Table Designs
OLTP-style design (normalized, process-oriented)
Below is a simplified OLTP design. It is optimized for correctness and updates, not for large aggregations.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
customers(customer_id, email, name, created_at, ...)
addresses(address_id, customer_id, line1, city, region, country, ...)
products(product_id, sku, name, category_id, brand_id, ...)
categories(category_id, category_name)
orders(order_id, customer_id, order_ts, status, billing_address_id, shipping_address_id, ...)
order_items(order_item_id, order_id, product_id, quantity, unit_price, discount_amount, tax_amount)
payments(payment_id, order_id, payment_ts, provider, status, amount)
shipments(shipment_id, order_id, shipped_ts, carrier, status)
refunds(refund_id, order_id, refund_ts, amount, reason)Typical OLTP queries:
- Get a single order: join
orderstoorder_items, maybepayments, maybeshipments. - Update payment status: update one row in
paymentsand possiblyorders. - Check inventory or validate product: lookup by
product_idorsku.
What happens if you try to build a dashboard directly on this model?
- “Net sales by month and channel” may require joining
orders,order_items,products,categories,payments, and possiblyrefunds, then filtering by time and statuses, then aggregating. - Definitions become scattered: is revenue based on order time or payment time? Are cancelled orders excluded? How are refunds applied?
- Performance can degrade as data grows and as more users run heavy queries.
BI/warehouse-style design (analysis-oriented)
In BI, you typically separate:
- Facts: measurable events (sales amount, quantity, discount, tax)
- Dimensions: descriptive context (date, product, customer, channel, geography)
For the sales dashboard, a common approach is to model a sales event at the grain of “one order line” (one product on one order). That becomes a fact table, with dimensions providing the slicing attributes.
dim_date(date_key, calendar_date, day, week, month, quarter, year)
dim_product(product_key, sku, product_name, category_name, brand_name, ...)
dim_customer(customer_key, customer_id, customer_segment, signup_date, ...)
dim_channel(channel_key, channel_name, campaign, device_type, ...)
dim_geography(geo_key, country, region, city)
fact_sales(order_line_id, date_key, product_key, customer_key, channel_key, geo_key,
order_id, quantity, gross_sales_amount, discount_amount, tax_amount,
net_sales_amount, cost_amount, profit_amount)Notice what changed:
- Attributes like
category_nameandbrand_nameare placed where analysts need them (often indim_product), rather than requiring multiple joins to lookup tables. - Measures like
net_sales_amountandprofit_amountare stored (or can be computed consistently during transformation) so every report uses the same logic. - Time analysis becomes straightforward with
dim_date(month, quarter, year are readily available).
Step-by-Step: Translating Checkout Data into a BI Model
Step 1: Choose the analytical grain
Decide what one row in your main analytical table represents. For a sales performance dashboard, common grains include:
- Order line grain: one row per product per order (best for product/category analysis)
- Order grain: one row per order (simpler, but less detail for product mix)
- Payment grain: one row per payment transaction (best for payment analytics)
Example choice: order line grain for fact_sales.
Step 2: Identify the measures (facts)
List numeric values you want to aggregate and compare:
- Quantity
- Gross sales (unit_price * quantity)
- Discount amount
- Tax amount
- Net sales (gross - discount + tax, depending on your definition)
- Cost and profit (if available)
Define them once in the transformation layer so every downstream report uses the same calculation.
Step 3: Identify the slicing axes (dimensions)
List how the business wants to break down results:
- Date (order date, payment date, ship date)
- Product (SKU, category, brand)
- Customer (segment, cohort, region)
- Channel (web, app, marketplace; campaign; device)
- Geography (shipping country/region/city)
Important: choose which timestamp drives the primary sales view (e.g., order date vs payment date). If multiple are needed, you can model multiple date keys (e.g., order_date_key, payment_date_key) depending on reporting needs.
Step 4: Map OLTP tables to the BI fact table
Using the OLTP example:
ordersprovides order-level context (order timestamp, customer, addresses, status)order_itemsprovides line-level measures (quantity, unit price, discounts, tax)productsprovides product identifiers to link todim_productcustomersprovides identifiers to link todim_customer
At load time, you create one row in fact_sales per order_items row, attach the appropriate dimension keys, and compute standardized measures.
Step 5: Shape dimensions for usability
Dimensions should be designed for filtering and grouping without extra joins. For example, instead of forcing analysts to join products to categories to get category names, you can include category_name directly in dim_product.
dim_product(product_key, sku, product_name, category_name, brand_name)This is a deliberate tradeoff: you may duplicate category names across many products, but reporting becomes simpler and faster.
Step 6: Validate with real dashboard questions
Test the BI model against common questions:
- “Net sales by month and channel” should require joining
fact_salestodim_dateanddim_channel, then aggregating. - “Top categories by profit last quarter” should require joining
fact_salestodim_productanddim_date. - “Repeat purchase rate by customer segment” should rely on consistent customer identifiers and a stable definition of “purchase”.
Side-by-Side: Same Question, Different Workloads
Question: Net sales by month and product category
In OLTP, you typically need multiple joins and careful filtering:
- Join
orderstoorder_items - Join to
productsthen tocategories - Filter out cancelled orders, handle refunds, choose the correct timestamp
- Group by month and category
In BI, you aim for a predictable, reusable pattern:
- Join
fact_salestodim_dateanddim_product - Group by
dim_date.monthanddim_product.category_name - Sum
net_sales_amount
The BI model reduces both the technical cost (fewer joins, clearer keys) and the organizational cost (shared definitions, fewer conflicting numbers).