All courses > Technology and Programming > Data Science and Business Intelligence ::

Power BI Essentials: Power Query Fundamentals and Query Organization

Capítulo 3

Estimated reading time: 9 minutes

Power Query as the Transformation Layer

Power Query is where you shape raw data into analysis-ready tables before it reaches the model. Think of it as a transformation layer that sits between your data sources and your Power BI model: it imports data, applies a sequence of transformations, and outputs clean tables.

The key idea is that Power Query does not “edit the source.” Instead, it records your actions as Applied Steps. Each step transforms the result of the previous step, creating a repeatable pipeline. When the dataset refreshes, Power Query re-runs the same steps in order, producing consistent results without manual rework.

How Applied Steps form a repeatable pipeline

Order matters: Steps run top-to-bottom. Filtering early can reduce work later; changing data types too early can sometimes cause errors if values are messy.
Each step should have a purpose: Prefer several small, readable steps over one complicated step that is hard to debug.
Steps are editable: You can click a step to see the state of the table at that point, rename steps, remove steps, or insert new steps.

In the background, Power Query generates M code for each step. You do not need to write M to be effective, but understanding that steps are code helps you treat transformations as a maintained pipeline rather than one-time cleanup.

Core Transformation Mechanics (with Practical Steps)

The transformations below cover the most common “data shaping” tasks you will use repeatedly. The goal is to make data consistent, reduce noise, and ensure columns are usable in relationships and measures.

Filtering rows

Filtering removes records you do not want in the final table (for example, canceled orders, test customers, or dates outside your reporting window).

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Step-by-step:

Select the column you want to filter (for example, Status).
Open the filter dropdown in the column header.
Choose a filter type (checkbox selection, text filters, number filters, date filters).
Confirm and review the new step in Applied Steps (typically named Filtered Rows).

Practical tips:

Filter as early as possible when it reduces row count significantly (performance and clarity).
Be careful with filtering on columns that may contain blanks or inconsistent casing; you may need to clean text first.

Removing columns

Removing columns keeps only what you need for analysis. This reduces model size and makes the dataset easier to understand.

Step-by-step:

Select one or more columns you do not need.
Right-click and choose Remove, or use Home > Remove Columns.
Prefer Remove Other Columns when you know exactly which columns you want to keep.

Practical tips:

Use Choose Columns if the source schema changes frequently; it can be easier to maintain a “keep list.”
Remove technical or intermediate columns before loading, unless they are needed for relationships or auditing.

Splitting columns

Splitting is used when multiple values are packed into one column (for example, “City, State” or “First Last”).

Step-by-step (by delimiter):

Select the column (for example, FullName).
Go to Transform > Split Column > By Delimiter.
Choose the delimiter (space, comma, hyphen, custom).
Choose whether to split at the left-most delimiter, right-most delimiter, or each occurrence.
Rename the resulting columns (for example, FirstName, LastName).

Practical tips:

If names or codes are inconsistent, splitting may produce errors or extra columns; consider cleaning and trimming first.
For “Last, First” formats, split by comma, then trim whitespace on the new columns.

Changing case (uppercase, lowercase, capitalize each word)

Case normalization helps with matching keys, grouping, and reducing duplicates caused by inconsistent casing (for example, “usa” vs “USA”).

Step-by-step:

Select the text column (for example, Country).
Go to Transform > Format.
Choose UPPERCASE, lowercase, or Capitalize Each Word.

Practical tips:

For join keys (like product codes), uppercase is often safer and more consistent.
For display fields (like customer names), “Capitalize Each Word” may be more readable, but confirm it doesn’t break special cases (e.g., “McDonald”).

Trimming and cleaning text

Text fields often contain leading/trailing spaces, non-printing characters, or inconsistent whitespace. These issues can break merges, create duplicate categories, and cause confusing visuals.

Trim removes leading and trailing spaces. Clean removes non-printable characters. You will often use both.

Step-by-step:

Select one or more text columns (for example, CustomerName, City).
Go to Transform > Format > Trim.
Then go to Transform > Format > Clean.

Practical tips:

Apply Trim/Clean before removing duplicates or merging queries.
If you see values that look identical but don’t group together, hidden whitespace or non-printing characters are common causes.

Dealing with blanks (nulls, empty strings, and missing values)

Blanks can appear as null (missing value), an empty string "", or whitespace-only text. Handling blanks correctly prevents errors in calculations and improves data quality.

Identify blanks and their type

null: Power Query’s true missing value.
Empty string: a text value that is present but contains nothing.
Whitespace-only: looks blank but contains spaces or tabs.

Practical approach:

For text columns, run Trim first to reduce whitespace-only values.
Use filters to inspect how many blanks you have and where they occur.

Common blank-handling actions

1) Replace blanks with a standard value (text):

Select the column.
Use Transform > Replace Values.
Replace null (or empty) with something like "Unknown" or "Not Provided".

2) Remove rows with blanks in critical fields:

Filter the critical column (for example, OrderID or CustomerID).
Uncheck (null) or blank values.
Confirm the step is created and verify that you are not removing valid records unintentionally.

3) Fill down/up (use carefully):

Fill Down is useful for data shaped like reports (where a category appears once and subsequent rows are blank until the next category). It is not appropriate for transactional tables where blanks represent missing data.

Select the column.
Choose Transform > Fill > Down (or Up).

Keeping Transformations Readable and Resilient

Readable queries are easier to debug and safer to refresh over time. Resilient queries are less likely to break when the source changes slightly (new columns, extra spaces, unexpected blanks).

Rename steps and avoid “mystery steps”

Power Query auto-names steps like Changed Type or Filtered Rows. Rename key steps so their intent is obvious.

Right-click a step in Applied Steps and choose Rename.
Use action + subject naming, for example: Filter_ActiveCustomers, Trim_CustomerName, Split_FullName.

Prefer stable operations and predictable order

Trim/Clean text before merges, grouping, or removing duplicates.
Remove unnecessary columns after you confirm you won’t need them for later steps.
Be intentional with Changed Type steps; type changes are important, but they can also surface errors if applied before cleanup.

Query Organization: A Structured Approach

As your project grows, query organization becomes essential. The goal is to make the Query pane understandable at a glance: which queries are sources, which are staging, which are final tables, and which are helper functions.

Naming queries (a simple convention)

Use names that communicate purpose and stage. One practical convention:

stg_ for staging queries (raw-to-clean transformations, not loaded)
dim_ for dimension tables (loaded)
fact_ for fact tables (loaded)
ref_ for reference/helper tables (loaded or not, depending on use)

Example set:

stg_Sales_Raw
stg_Sales_Clean
dim_Customer
dim_Product
fact_Sales

Grouping queries into folders

Groups help you navigate quickly and reduce mistakes (like editing the wrong query).

Step-by-step:

In the Queries pane, right-click a query.
Select Move to group.
Create groups such as: 00_Sources, 10_Staging, 20_Dimensions, 30_Facts, 90_Helpers.

Practical tip: Prefix group names with numbers so they stay in a logical order.

Enable/Disable Load (control what enters the model)

Not every query should load to the model. Staging queries are often used only as building blocks for final tables.

Step-by-step:

Right-click a query in the Queries pane.
Toggle Enable load.

Guideline:

Disable load for staging queries (stg_*) and intermediate helper queries that exist only to support other queries.
Enable load for final tables that you will relate and report on (dim_*, fact_*).

Reference vs Duplicate (choose the right reuse pattern)

When you want to reuse a query as a starting point, you have two main options:

Reference: Creates a new query that points to the output of the original query. Changes to the original flow through to the referenced query.
Duplicate: Creates a copy with its own independent steps. Changes do not flow between them.

When to use Reference:

You want a single “source of truth” staging query, and multiple final queries derived from it.
You want consistent cleanup rules applied everywhere (trim, standardize case, remove test rows).

When to use Duplicate:

You need to branch in a way that should not be affected by future changes to the original.
You are experimenting and want a safe sandbox copy.

Step-by-step:

Right-click a query.
Choose Reference or Duplicate.
Rename the new query immediately to reflect its role (for example, stg_Sales_Clean referenced from stg_Sales_Raw).

Practical Example: A Clean, Organized Pipeline

Scenario: You have a sales table with extra columns, inconsistent customer names, and blank regions. You want a clean fact table and a separate customer dimension.

Build a staging query

Create stg_Sales_Raw (source output).
Create stg_Sales_Clean as a Reference of stg_Sales_Raw.
In stg_Sales_Clean, apply steps in a readable order: remove unnecessary columns, trim/clean text, standardize case for keys, split columns if needed, handle blanks, then set data types.
Disable load for both staging queries.

Create final loaded tables

Create fact_Sales as a Reference of stg_Sales_Clean, keeping only columns needed for the fact table.
Create dim_Customer as a Reference of stg_Sales_Clean, selecting customer-related columns and removing duplicates.
Enable load for fact_Sales and dim_Customer.

Example of readable step naming

stg_Sales_Clean steps (example names): 01_RemoveUnneededColumns 02_TrimClean_TextFields 03_StandardizeCase_CustomerID 04_Split_CustomerName 05_HandleBlanks_Region 06_ChangeTypes

Every-Query Checklist (Readable and Resilient)

Name the query clearly (stage + subject), and place it in the correct group.
Disable load for staging/intermediate queries; enable load only for final model tables.
Rename important steps so the intent is obvious.
Filter early when it reduces data volume, but clean text first if filters depend on messy fields.
Trim/Clean text before merges, grouping, or removing duplicates.
Remove unnecessary columns to reduce noise and model size.
Handle blanks intentionally (remove, replace, or fill) based on business meaning.
Apply data types after cleanup, and verify there are no unexpected errors.
Use Reference for shared pipelines; Duplicate for independent branches or experiments.

Now answer the exercise about the content:

In Power Query, what is the best approach when you want multiple final tables to share the same cleanup rules from a single staging query?

You are right! Congratulations, now go to the next page

You missed! Try again.

Reference creates a new query that depends on the original query’s output, so updates to the staging steps flow through to all referenced final tables. Duplicate creates independent copies that do not stay in sync.