Power Query as the Transformation Layer
Power Query is where you shape raw data into analysis-ready tables before it reaches the model. Think of it as a transformation layer that sits between your data sources and your Power BI model: it imports data, applies a sequence of transformations, and outputs clean tables.
The key idea is that Power Query does not “edit the source.” Instead, it records your actions as Applied Steps. Each step transforms the result of the previous step, creating a repeatable pipeline. When the dataset refreshes, Power Query re-runs the same steps in order, producing consistent results without manual rework.
How Applied Steps form a repeatable pipeline
- Order matters: Steps run top-to-bottom. Filtering early can reduce work later; changing data types too early can sometimes cause errors if values are messy.
- Each step should have a purpose: Prefer several small, readable steps over one complicated step that is hard to debug.
- Steps are editable: You can click a step to see the state of the table at that point, rename steps, remove steps, or insert new steps.
In the background, Power Query generates M code for each step. You do not need to write M to be effective, but understanding that steps are code helps you treat transformations as a maintained pipeline rather than one-time cleanup.
Core Transformation Mechanics (with Practical Steps)
The transformations below cover the most common “data shaping” tasks you will use repeatedly. The goal is to make data consistent, reduce noise, and ensure columns are usable in relationships and measures.
Filtering rows
Filtering removes records you do not want in the final table (for example, canceled orders, test customers, or dates outside your reporting window).
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
Step-by-step:
- Select the column you want to filter (for example,
Status). - Open the filter dropdown in the column header.
- Choose a filter type (checkbox selection, text filters, number filters, date filters).
- Confirm and review the new step in Applied Steps (typically named
Filtered Rows).
Practical tips:
- Filter as early as possible when it reduces row count significantly (performance and clarity).
- Be careful with filtering on columns that may contain blanks or inconsistent casing; you may need to clean text first.
Removing columns
Removing columns keeps only what you need for analysis. This reduces model size and makes the dataset easier to understand.
Step-by-step:
- Select one or more columns you do not need.
- Right-click and choose Remove, or use Home > Remove Columns.
- Prefer Remove Other Columns when you know exactly which columns you want to keep.
Practical tips:
- Use Choose Columns if the source schema changes frequently; it can be easier to maintain a “keep list.”
- Remove technical or intermediate columns before loading, unless they are needed for relationships or auditing.
Splitting columns
Splitting is used when multiple values are packed into one column (for example, “City, State” or “First Last”).
Step-by-step (by delimiter):
- Select the column (for example,
FullName). - Go to Transform > Split Column > By Delimiter.
- Choose the delimiter (space, comma, hyphen, custom).
- Choose whether to split at the left-most delimiter, right-most delimiter, or each occurrence.
- Rename the resulting columns (for example,
FirstName,LastName).
Practical tips:
- If names or codes are inconsistent, splitting may produce errors or extra columns; consider cleaning and trimming first.
- For “Last, First” formats, split by comma, then trim whitespace on the new columns.
Changing case (uppercase, lowercase, capitalize each word)
Case normalization helps with matching keys, grouping, and reducing duplicates caused by inconsistent casing (for example, “usa” vs “USA”).
Step-by-step:
- Select the text column (for example,
Country). - Go to Transform > Format.
- Choose UPPERCASE, lowercase, or Capitalize Each Word.
Practical tips:
- For join keys (like product codes), uppercase is often safer and more consistent.
- For display fields (like customer names), “Capitalize Each Word” may be more readable, but confirm it doesn’t break special cases (e.g., “McDonald”).
Trimming and cleaning text
Text fields often contain leading/trailing spaces, non-printing characters, or inconsistent whitespace. These issues can break merges, create duplicate categories, and cause confusing visuals.
Trim removes leading and trailing spaces. Clean removes non-printable characters. You will often use both.
Step-by-step:
- Select one or more text columns (for example,
CustomerName,City). - Go to Transform > Format > Trim.
- Then go to Transform > Format > Clean.
Practical tips:
- Apply Trim/Clean before removing duplicates or merging queries.
- If you see values that look identical but don’t group together, hidden whitespace or non-printing characters are common causes.
Dealing with blanks (nulls, empty strings, and missing values)
Blanks can appear as null (missing value), an empty string "", or whitespace-only text. Handling blanks correctly prevents errors in calculations and improves data quality.
Identify blanks and their type
- null: Power Query’s true missing value.
- Empty string: a text value that is present but contains nothing.
- Whitespace-only: looks blank but contains spaces or tabs.
Practical approach:
- For text columns, run Trim first to reduce whitespace-only values.
- Use filters to inspect how many blanks you have and where they occur.
Common blank-handling actions
1) Replace blanks with a standard value (text):
- Select the column.
- Use Transform > Replace Values.
- Replace
null(or empty) with something like"Unknown"or"Not Provided".
2) Remove rows with blanks in critical fields:
- Filter the critical column (for example,
OrderIDorCustomerID). - Uncheck
(null)or blank values. - Confirm the step is created and verify that you are not removing valid records unintentionally.
3) Fill down/up (use carefully):
Fill Down is useful for data shaped like reports (where a category appears once and subsequent rows are blank until the next category). It is not appropriate for transactional tables where blanks represent missing data.
- Select the column.
- Choose Transform > Fill > Down (or Up).
Keeping Transformations Readable and Resilient
Readable queries are easier to debug and safer to refresh over time. Resilient queries are less likely to break when the source changes slightly (new columns, extra spaces, unexpected blanks).
Rename steps and avoid “mystery steps”
Power Query auto-names steps like Changed Type or Filtered Rows. Rename key steps so their intent is obvious.
- Right-click a step in Applied Steps and choose Rename.
- Use action + subject naming, for example:
Filter_ActiveCustomers,Trim_CustomerName,Split_FullName.
Prefer stable operations and predictable order
- Trim/Clean text before merges, grouping, or removing duplicates.
- Remove unnecessary columns after you confirm you won’t need them for later steps.
- Be intentional with
Changed Typesteps; type changes are important, but they can also surface errors if applied before cleanup.
Query Organization: A Structured Approach
As your project grows, query organization becomes essential. The goal is to make the Query pane understandable at a glance: which queries are sources, which are staging, which are final tables, and which are helper functions.
Naming queries (a simple convention)
Use names that communicate purpose and stage. One practical convention:
- stg_ for staging queries (raw-to-clean transformations, not loaded)
- dim_ for dimension tables (loaded)
- fact_ for fact tables (loaded)
- ref_ for reference/helper tables (loaded or not, depending on use)
Example set:
stg_Sales_Rawstg_Sales_Cleandim_Customerdim_Productfact_Sales
Grouping queries into folders
Groups help you navigate quickly and reduce mistakes (like editing the wrong query).
Step-by-step:
- In the Queries pane, right-click a query.
- Select Move to group.
- Create groups such as:
00_Sources,10_Staging,20_Dimensions,30_Facts,90_Helpers.
Practical tip: Prefix group names with numbers so they stay in a logical order.
Enable/Disable Load (control what enters the model)
Not every query should load to the model. Staging queries are often used only as building blocks for final tables.
Step-by-step:
- Right-click a query in the Queries pane.
- Toggle Enable load.
Guideline:
- Disable load for staging queries (
stg_*) and intermediate helper queries that exist only to support other queries. - Enable load for final tables that you will relate and report on (
dim_*,fact_*).
Reference vs Duplicate (choose the right reuse pattern)
When you want to reuse a query as a starting point, you have two main options:
- Reference: Creates a new query that points to the output of the original query. Changes to the original flow through to the referenced query.
- Duplicate: Creates a copy with its own independent steps. Changes do not flow between them.
When to use Reference:
- You want a single “source of truth” staging query, and multiple final queries derived from it.
- You want consistent cleanup rules applied everywhere (trim, standardize case, remove test rows).
When to use Duplicate:
- You need to branch in a way that should not be affected by future changes to the original.
- You are experimenting and want a safe sandbox copy.
Step-by-step:
- Right-click a query.
- Choose Reference or Duplicate.
- Rename the new query immediately to reflect its role (for example,
stg_Sales_Cleanreferenced fromstg_Sales_Raw).
Practical Example: A Clean, Organized Pipeline
Scenario: You have a sales table with extra columns, inconsistent customer names, and blank regions. You want a clean fact table and a separate customer dimension.
Build a staging query
- Create
stg_Sales_Raw(source output). - Create
stg_Sales_Cleanas a Reference ofstg_Sales_Raw. - In
stg_Sales_Clean, apply steps in a readable order: remove unnecessary columns, trim/clean text, standardize case for keys, split columns if needed, handle blanks, then set data types. - Disable load for both staging queries.
Create final loaded tables
- Create
fact_Salesas a Reference ofstg_Sales_Clean, keeping only columns needed for the fact table. - Create
dim_Customeras a Reference ofstg_Sales_Clean, selecting customer-related columns and removing duplicates. - Enable load for
fact_Salesanddim_Customer.
Example of readable step naming
stg_Sales_Clean steps (example names): 01_RemoveUnneededColumns 02_TrimClean_TextFields 03_StandardizeCase_CustomerID 04_Split_CustomerName 05_HandleBlanks_Region 06_ChangeTypesEvery-Query Checklist (Readable and Resilient)
- Name the query clearly (stage + subject), and place it in the correct group.
- Disable load for staging/intermediate queries; enable load only for final model tables.
- Rename important steps so the intent is obvious.
- Filter early when it reduces data volume, but clean text first if filters depend on messy fields.
- Trim/Clean text before merges, grouping, or removing duplicates.
- Remove unnecessary columns to reduce noise and model size.
- Handle blanks intentionally (remove, replace, or fill) based on business meaning.
- Apply data types after cleanup, and verify there are no unexpected errors.
- Use Reference for shared pipelines; Duplicate for independent branches or experiments.