All courses > Technology and Programming > Data Science and Business Intelligence ::

Power BI Essentials: Designing a Maintainable Data Model

Capítulo 6

Estimated reading time: 8 minutes

Adopt the Semantic Model Mindset

A maintainable Power BI model is not a mirror of your source system. It is a semantic layer: a set of tables and relationships designed to answer business questions consistently, even as data volume and complexity grow. The goal is to make “what does this number mean?” unambiguous and repeatable across reports.

When you design with a semantic mindset, you prioritize:

Consistency: the same metric behaves the same way everywhere.
Clarity: business users can find fields and understand them.
Performance: fewer tables and simpler relationships reduce ambiguity and speed up calculations.
Extensibility: adding a new dimension (e.g., Channel) or a new fact (e.g., Returns) doesn’t force a redesign.

Model-first questions to ask

What is the grain? (What does one row represent?)
What are the facts? (Numeric measures you aggregate: sales amount, quantity, cost.)
What are the dimensions? (Descriptors you slice by: date, product, customer, region.)
Which filters should flow to which tables? (Typically dimensions filter facts.)

Star Schema Basics (Without the Drama)

Fact tables

A fact table stores events or transactions at a defined grain. It usually contains:

Foreign keys to dimensions (ProductKey, CustomerKey, DateKey)
Numeric columns that can be aggregated (SalesAmount, Quantity)
Degenerate dimensions sometimes (e.g., OrderNumber) when you need the identifier but not a full dimension table

Common grains include “one row per order line”, “one row per invoice”, or “one row per day per product”. Pick one and keep it stable.

Dimension tables

A dimension table provides descriptive context. It typically contains:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

A key (ProductKey)
Attributes used for slicing and grouping (Category, Brand, Color)
Hierarchies (e.g., Category > Subcategory > Product)

Dimensions should be wide (more descriptive columns) and relatively small compared to facts.

Why denormalization decisions matter

In Power BI, denormalizing dimensions (bringing related descriptive attributes into a single dimension table) often improves usability and reduces relationship complexity. However, denormalization should be intentional:

Denormalize when it reduces relationship chains and makes slicing straightforward (e.g., include CategoryName in DimProduct instead of a separate DimCategory).
Keep separate dimensions when attributes change at different rates or represent distinct entities (e.g., Customer vs. Geography if geography is shared and managed separately).
Avoid snowflake chains (DimProduct → DimCategory → FactSales) unless there is a strong reason; they increase ambiguity and can complicate DAX.

Guided Rebuild: From Cleaned Queries to a Lean Model

This walkthrough assumes you already have cleaned queries loaded (or ready to load) such as Sales, Products, Customers, and a Calendar table. The focus here is modeling decisions, not query steps.

Step 1: Inventory your tables and classify them

In Model view (or using the Fields pane), list each query and decide whether it is:

Fact candidate: transactional/periodic rows with numeric values
Dimension candidate: descriptive attributes used for filtering/grouping
Helper/technical: staging, mapping, bridge, or audit tables

Target outcome: a small number of facts and a set of shared dimensions.

Step 2: Identify and lock the grain of each fact

For each fact candidate, write the grain as a sentence. Examples:

FactSales: “One row per order line.”
FactInventorySnapshot: “One row per product per day.”
FactWebEvents: “One row per session event.”

Then validate the grain by checking uniqueness of the natural key(s). If FactSales is “order line”, then (OrderID, LineNumber) should be unique. If it isn’t, you don’t have the grain you think you have.

Step 3: Choose dimensions based on how users ask questions

Dimensions should match common analysis patterns. Typical shared dimensions:

Date: almost always needed; supports time intelligence.
Product: category/brand/SKU analysis.
Customer: segmentation, cohorts, geography.
Salesperson/Store/Channel: performance comparisons.
Geography: region/country/city slicing (either as part of Customer/Store or as its own dimension).

Keep dimensions conformed: if multiple facts use “Product”, they should use the same DimProduct table and key.

Step 4: Build (or refine) dimension tables

For each dimension:

Ensure a unique key (one row per key).
Keep only descriptive attributes needed for slicing, grouping, and labeling.
Remove fact-like numeric columns that should be measures (e.g., “TotalSalesToDate” belongs as a measure, not a dimension column).

Example: DimProduct fields might include ProductKey, ProductName, Brand, Category, Subcategory, Size, Color, DiscontinuedFlag.

Step 5: Keep the model lean by removing or disabling unnecessary loads

A maintainable model avoids loading tables that do not directly support analysis.

Disable load for staging/helper queries that only exist to produce final dimensions/facts.
Prefer one dimension per entity rather than multiple near-duplicates (e.g., avoid DimCustomer and CustomerLookup with overlapping columns).
Eliminate unused columns from loaded tables to reduce memory and confusion.

Step 6: Create relationships intentionally (star pattern)

In Model view, connect each dimension to the fact(s):

Cardinality: Dimension (1) to Fact (*)
Cross filter direction: Single (Dimension → Fact) in most cases
Active relationships: Keep one active relationship between a dimension and a fact for a given key; use role-playing dimensions or inactive relationships only when necessary.

Example relationship set:

DimDate[DateKey] (1) → FactSales[DateKey] (*)
DimProduct[ProductKey] (1) → FactSales[ProductKey] (*)
DimCustomer[CustomerKey] (1) → FactSales[CustomerKey] (*)

Step 7: Validate filter flow with a simple mental test

Ask: “If I filter Category = Bikes, will it correctly filter SalesAmount?” In a star schema, the answer should be yes because DimProduct filters FactSales. If you need a fact to filter a dimension, that’s a warning sign (often caused by bidirectional filters or bridge tables).

Practical Example: Rebuilding a Sales Model

Suppose you have these cleaned tables loaded:

Sales (OrderDate, ProductID, CustomerID, Quantity, NetAmount, DiscountAmount, OrderNumber, LineNumber)
Products (ProductID, ProductName, Category, Subcategory, Brand)
Customers (CustomerID, CustomerName, Segment, City, State, Country)
Calendar (Date, Year, Quarter, Month, MonthName)

Step-by-step rebuild

Define grain: Sales is one row per order line (OrderNumber + LineNumber).
Create FactSales: Keep keys and numeric columns: OrderDate (or DateKey), ProductID, CustomerID, Quantity, NetAmount, DiscountAmount, OrderNumber, LineNumber.
Create DimProduct: ProductID, ProductName, Category, Subcategory, Brand.
Create DimCustomer: CustomerID, CustomerName, Segment, City, State, Country.
Create DimDate: Use Calendar; ensure Date is unique (one row per date).
Relationships: DimDate[Date] → FactSales[OrderDate]; DimProduct[ProductID] → FactSales[ProductID]; DimCustomer[CustomerID] → FactSales[CustomerID].
Check cardinality: Each dimension key must be unique; if not, fix duplicates in the dimension.

At this point, you have a clean star: three dimensions around one fact. This structure scales well when you add new facts (e.g., FactReturns) that can reuse DimDate, DimProduct, and DimCustomer.

Best Practices for a Maintainable Model

Table naming conventions

Use names that communicate role and content. A simple convention:

Fact tables: FactSales, FactInventory, FactReturns
Dimensions: DimDate, DimProduct, DimCustomer, DimStore
Bridges (if needed): BridgeCustomerSegment
Helper tables: (avoid loading; if loaded, name clearly like HelperParameters)

Keep names singular or plural consistently (choose one standard). Avoid cryptic source-system names.

Hide technical columns to reduce mistakes

Many columns exist only to support relationships or internal logic. Hide them from report view to keep the field list clean:

Surrogate keys (ProductKey, CustomerKey) if users never need to drag them into visuals
Row identifiers (LineNumber, InternalID)
ETL audit fields (LoadTimestamp, SourceFileName)

Keep the columns visible that users actually use: names, categories, dates, and curated numeric columns (or better: measures).

Organize fields with display folders

Display folders help users find the right fields quickly, especially in wide dimensions. Suggested folder patterns:

DimDate: “Calendar” (Year, Quarter, Month), “Date Attributes” (DayOfWeek, IsWeekend)
DimCustomer: “Customer” (Name, Segment), “Geography” (City, State, Country)
DimProduct: “Product” (ProductName), “Classification” (Category, Subcategory, Brand)

For measures, use folders like “Sales”, “Margin”, “Operations” to group related calculations.

Set default summarization to prevent accidental aggregation

Power BI will often default numeric columns to “Sum”, which can create incorrect visuals (e.g., summing an ID, summing a unit price, or summing a percentage). For each numeric column, decide the correct default behavior:

Never summarize: IDs, unit price, discount rate, margin %, inventory on hand (if it’s a snapshot value used differently)
Sum: SalesAmount, Quantity, DiscountAmount
Average: Customer satisfaction score, delivery days (if meaningful)
Min/Max: StartDate, EndDate, LatestStatusDate (often better handled via measures)

Practical approach: keep most business metrics as measures and set raw numeric columns to “Don’t summarize” unless they are clearly additive. This reduces the chance that a user drags a column into a visual and gets a misleading total.

Keep relationships simple and predictable

Prefer single-direction filters from dimensions to facts.
Avoid many-to-many unless you truly need it; consider bridge tables with clear intent.
Limit bi-directional filtering to specific cases and document why it exists.
Use one Date dimension and connect it to facts consistently; if you need multiple date roles (Order Date vs Ship Date), consider role-playing dimensions or inactive relationships paired with measures.

Quick model checklist (before building visuals)

Each fact has a clearly stated grain and stable keys.
Each dimension key is unique (no duplicates).
Relationships are mostly 1:* with single-direction filtering.
Unneeded tables/columns are removed or not loaded.
Technical columns are hidden; user-facing fields are named clearly.
Display folders are applied to wide tables and measures.
Default summarization is set to avoid accidental sums.

Now answer the exercise about the content:

When designing a maintainable Power BI star schema, which relationship setup best supports predictable filter flow and avoids ambiguity?

You are right! Congratulations, now go to the next page

You missed! Try again.

A star schema is most predictable when dimensions (1) filter facts (*) using single-direction relationships. This keeps filter flow clear (e.g., Category filters Sales) and reduces ambiguity and DAX complexity.