All courses > Technology and Programming > Data Science and Business Intelligence ::

Power BI Essentials: Model Scalability, Performance Basics, and Maintainability

Capítulo 9

Estimated reading time: 9 minutes

What “scalable and maintainable” means in a Power BI model

A scalable model keeps working as data volume grows (more rows, more files, more users) without becoming slow to refresh or slow to query. A maintainable model stays understandable and safe to change over time, so you can add new sources, extend measures, and troubleshoot issues without breaking refresh or producing inconsistent results.

In practice, scalability and maintainability come from a few habits: keep only the columns you need, avoid unnecessary intermediate loads, choose efficient calculations, keep data types lean, and adopt ingestion patterns that can grow without constant manual edits.

Reduce column count (and why it matters)

Every loaded column increases memory usage, refresh time, and the amount of work the engine must do during compression and query execution. Removing unused columns is one of the highest-impact performance improvements you can make.

Step-by-step: identify and remove unused columns

Start from the report needs: list the fields used in visuals, slicers, filters, and measure logic.
In Power Query, keep only necessary columns using a “keep columns” step rather than repeatedly removing columns. This makes the intent clearer and reduces accidental reintroduction of fields.
Remove high-cardinality text columns that are not used for grouping or filtering (e.g., free-form comments, long URLs, raw JSON payloads).
Split “wide” tables when appropriate: if a table contains many attributes that are rarely used, consider moving those attributes to a separate dimension table (loaded only if needed) or not loading them at all.

// Power Query pattern: keep only what you need (more stable than many remove steps) Table.SelectColumns(PreviousStep, {"Date","CustomerID","ProductID","Quantity","NetAmount"})

Practical checks

If a column is only used to build another column and is not needed afterward, remove it after the derived field is created.
Prefer numeric surrogate keys (e.g., CustomerID) over long text keys for relationships and grouping.

Avoid loading intermediary queries

Intermediary queries are useful for organizing logic (staging, reference lists, helper transformations), but loading them into the model adds memory and can create confusion. A common scalable pattern is: stage queries (not loaded) → final dimension/fact queries (loaded).

Step-by-step: implement a staging pattern

Create a query that performs source access and heavy transformations (staging).
Right-click the staging query and disable loading (Enable load off).
Reference the staging query to create final tables that are loaded.
Keep staging queries in a dedicated query group (e.g., “_Staging”) so they are easy to find and less likely to be accidentally loaded.

Why referencing helps

It separates “data acquisition and cleanup” from “model-ready shape.”
It reduces the chance that helper tables become visible fields that users accidentally use.
It makes it easier to swap sources or adjust transformations without touching the final tables’ names and schema.

Prefer measures over calculated columns (when appropriate)

Calculated columns are computed during refresh and stored in the model, increasing size and refresh time. Measures are computed at query time and do not increase model size. When a calculation is only needed for aggregation in visuals (totals, ratios, time-aware metrics), a measure is often the better choice.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Use calculated columns when

You need a value to be used as a slicer, axis, or grouping category (e.g., “Customer Segment”).
You need a relationship key or a persistent attribute used for filtering.
The logic is row-level and must be stored (but validate whether it can be done in Power Query instead).

Use measures when

You need aggregations (sum, average, distinct count) and ratios.
You need logic that depends on filter context (e.g., “% of total,” “YoY change”).
You want to minimize model size and keep refresh faster.

// Example measure pattern (computed at query time) Total Sales := SUM ( 'Sales'[NetAmount] ) Sales per Unit := DIVIDE ( [Total Sales], SUM ( 'Sales'[Quantity] ) )

Choose efficient data types and encodings

Efficient data types reduce memory footprint and speed up compression. Even small improvements matter when row counts grow into millions.

Practical guidance

Use whole numbers for keys and counts whenever possible.
Use fixed decimal for currency when appropriate to reduce floating-point issues and improve compression.
Avoid DateTime when Date is enough. If time is not analyzed, store only Date. If time is needed, consider separating Date and Time columns.
Reduce text cardinality: normalize repeated text into dimension tables (e.g., Product Category) instead of repeating long strings in fact tables.
Remove leading/trailing spaces and inconsistent casing in text fields that are used for grouping; otherwise you create extra distinct values.

Step-by-step: validate type efficiency

In Power Query, confirm types are set intentionally (not left as “Any”).
In the model, review columns that are large text or high-cardinality and confirm they are truly required.
For numeric columns, confirm the narrowest type that preserves meaning (Whole Number vs Decimal Number vs Fixed Decimal).

Incremental file ingestion patterns (folder-based append)

As data grows, manual file imports do not scale. A folder-based ingestion pattern lets you drop new files into a folder and refresh to automatically include them. This is especially useful for monthly exports, daily extracts, or system-generated CSVs.

Folder-based append: recommended pattern

One folder per dataset (e.g., Sales_Extracts), with consistent file naming and schema.
Keep a stable schema: same column names and types across files.
Use a staging query that reads the folder and applies a single “transform file” function.
Filter early: exclude hidden/system files and optionally filter by extension or naming pattern.

Step-by-step: implement a robust folder append

Create a query from Folder and select the folder path.
Filter out hidden files and non-data files (e.g., temporary files).
Create a “Transform Sample File” logic once, then apply it to all files (Power Query typically scaffolds this automatically).
Expand the transformed tables and set data types in the final query.
Add a SourceFileName and/or IngestedDate column if you need traceability and debugging.

// Power Query idea: keep file lineage for troubleshooting Table.AddColumn(PreviousStep, "SourceFileName", each [Name], type text)

Stability tips for incremental file growth

Schema drift handling: if columns may appear/disappear, explicitly select expected columns and provide defaults for missing fields.
Type consistency: apply type conversions after combining, but ensure the transform function outputs consistent column types.
Partitioning by period: if files are monthly, keep them monthly; don’t merge into huge single files unless required.

Documentation habits that scale

Documentation is part of performance and reliability because it reduces rework and prevents accidental misuse of fields. The goal is to make the model self-explanatory to future you and to other report authors.

Descriptions for tables, columns, and measures

Tables: describe the grain (what one row represents) and the source.
Measures: describe business meaning, filters assumed, and any exclusions (e.g., “excludes returns”).
Columns: document special handling (e.g., “trimmed and uppercased,” “nullable,” “derived from file name”).

Practical measure documentation template

Measure description template: - Business definition: - Numerator/denominator (if ratio): - Included/excluded rows: - Intended visuals (card, trend, matrix): - Notes (rounding, currency, timezone):

Organize measures for discoverability

Create a dedicated “Measures” table (or a small set of measure tables by subject area) and hide it from report view if your team prefers a cleaner field list.
Use display folders to group measures (e.g., Sales, Margin, Customers) so users can find them quickly.

Version-safe transformations (reduce breakage over time)

Version-safe transformations are steps that keep working when new files arrive, when columns are reordered, or when sources evolve slightly. The idea is to avoid brittle logic that depends on “column position 7” or a specific file name that changes monthly.

Common sources of refresh breakage

Renamed columns in source exports.
Extra header rows or footer totals appearing in some files.
Different delimiters/encodings across files.
Changing folder paths or using user-specific local paths.

Practical techniques

Select columns by name (not by position) and keep a controlled list of expected fields.
Use parameters for folder paths or environment-specific values (Dev/Test/Prod) so you don’t hardcode local paths.
Handle missing columns by adding them with null/default values when absent, then enforce types.
Filter out non-data rows using rules (e.g., remove rows where a key column is null or equals “Total”).
Keep transformations deterministic: avoid steps that depend on current time unless explicitly needed and documented.

// Power Query pattern: add missing columns safely (conceptual) let Expected = {"Date","CustomerID","NetAmount"}, Existing = Table.ColumnNames(Source), Missing = List.Difference(Expected, Existing), WithMissing = List.Accumulate(Missing, Source, (state, col) => Table.AddColumn(state, col, each null)) in Table.SelectColumns(WithMissing, Expected)

Maintainability review checklist (repeat on every major change)

Naming audit

Table names clearly indicate role (FactSales, DimCustomer) or business meaning (Sales, Customers) consistently across the model.
Column names are business-friendly and avoid ambiguous terms (e.g., “Value,” “Amount1”).
Measure names read well in visuals (e.g., “Total Sales,” “Gross Margin %”).
Consistent casing and separators (choose one style and apply it everywhere).

Field visibility audit

Hide technical keys and helper columns that should not be used directly (surrogate keys, row IDs, ingestion metadata unless needed).
Hide staging tables and intermediate tables from the report field list.
Ensure users primarily interact with measures and curated dimensions, not raw fact columns.

Relationship audit

Confirm every relationship is intentional and supports the expected filter flow.
Remove unused relationships and avoid ambiguous paths that can cause inconsistent results.
Check that relationship keys have matching data types and clean values (no unexpected blanks or mixed formats).
Validate that dimensions have unique keys (no duplicates) to avoid many-to-many surprises unless explicitly designed.

Refresh validation (stability over time)

Run a full refresh after changes and confirm row counts are within expected ranges for key tables.
Validate incremental ingestion: drop a new file into the folder and confirm it is included exactly once (no duplicates).
Spot-check critical measures before and after changes using a small validation page (e.g., totals by month, top customers).
Review refresh duration trends; a sudden increase often indicates schema drift, new high-cardinality columns, or an accidental load of an intermediate query.

Now answer the exercise about the content:

Which approach best improves scalability and maintainability when preparing data in Power BI?

You are right! Congratulations, now go to the next page

You missed! Try again.

Scalable models reduce memory and refresh cost by loading only necessary columns and final tables. Keeping staging queries not loaded and using efficient types helps performance and makes changes safer over time.