All courses > Technology and Programming > Office Productivity (Excel, Word and More) ::

Cleaning and Standardizing: Text, Dates, Numbers, and Null Handling

Capítulo 5

Estimated reading time: 12 minutes

Why cleaning and standardizing matters in Power Query

In Power Query, “cleaning” means removing inconsistencies and errors so the data behaves predictably. “Standardizing” means enforcing one agreed format for each field (text casing, date type, numeric type, missing-value rules, and consistent codes). The goal is not to make the data look nice in a worksheet; it is to make downstream steps (joins, grouping, measures, and refreshes) reliable.

Most issues fall into four buckets: text inconsistencies (extra spaces, mixed casing, hidden characters), date inconsistencies (text dates, mixed locales, invalid dates), numeric inconsistencies (numbers stored as text, thousand separators, currency symbols, negative formats), and null handling (blanks, “N/A”, “-”, and other placeholders that should become null or a default).

Power Query gives you two complementary approaches: (1) UI transformations (good for discoverability and quick iteration) and (2) explicit M code (good for precision, reuse, and avoiding ambiguous locale behavior). In practice, you often start with UI steps and then refine the generated M to make it deterministic.

Text cleaning and standardization

Common text problems you should expect

Leading/trailing spaces: " Acme" vs "Acme"
Multiple internal spaces: "Acme Corp"
Non-printing characters: tabs, line breaks, non-breaking spaces
Inconsistent casing: "john SMITH", "John Smith"
Inconsistent punctuation: "O'Reilly" vs "OReilly"
Mixed codes: "us", "USA", "United States"
Empty strings vs nulls: "" is not the same as null

Step-by-step: normalize a text column (trim, clean, case, and null rules)

Scenario: You have a CustomerName column coming from multiple sources. You want to remove extra spaces, remove non-printing characters, standardize to Proper Case, and convert empty strings to null.

In the Power Query Editor:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Select CustomerName.
Use Transform > Format > Trim to remove leading/trailing spaces.
Use Transform > Format > Clean to remove non-printing characters.
Optionally use Transform > Format > Capitalize Each Word (Proper Case) if that matches your standard.
Convert empty strings to null using Transform > Replace Values: replace "" with null. If the UI does not allow an empty search value easily, do it in M.

Refine in M for repeatability and to handle edge cases (like non-breaking spaces). Example M pattern:

let    Source = ... ,    NormalizeCustomer = Table.TransformColumns(        Source,        {            {                "CustomerName",                (t) =>                    let                        t0 = if t = null then null else Text.From(t),                        t1 = if t0 = null then null else Text.Replace(t0, Character.FromNumber(160), " "),                        t2 = if t1 = null then null else Text.Clean(Text.Trim(t1)),                        t3 = if t2 = "" then null else Text.Proper(t2)                    in                        t3,                type text            }        }    )in    NormalizeCustomer

Notes:

Character.FromNumber(160) targets the common non-breaking space that Trim may not remove.
Text.From ensures non-text values don’t break the transformation.
Setting type text inside TransformColumns helps keep types stable.

Standardizing codes with mapping tables (recommended)

When you need to standardize values like country codes, product categories, or department names, avoid long chains of “Replace Values” steps. A mapping table is easier to maintain and audit.

Approach:

Create a small mapping table with columns like RawValue and StandardValue.
Merge your main table to the mapping table on the raw column.
Use the mapped StandardValue when available; otherwise keep the original (or set to null).

Example M pattern (assuming you already have a mapping query named MapCountry):

let    Source = ... ,    Merged = Table.NestedJoin(Source, {"Country"}, MapCountry, {"RawValue"}, "Map", JoinKind.LeftOuter),    Expanded = Table.ExpandTableColumn(Merged, "Map", {"StandardValue"}, {"Country_Standard"}),    Final = Table.AddColumn(        Expanded,        "Country_Final",        each if [Country_Standard] <> null then [Country_Standard] else [Country],        type text    ),    Cleanup = Table.RemoveColumns(Final, {"Country_Standard"})in    Cleanup

This makes standardization explicit and maintainable. You can also add an “Unmapped” check by filtering where Country_Standard is null to see which raw values need mapping.

Splitting, extracting, and recombining text safely

Text often contains embedded structure: “Last, First”, “City - State”, “SKU-Color-Size”. Power Query has UI tools like Split Column and Extract. The key is to decide what to do when the delimiter is missing or appears multiple times.

Example: Split “Last, First” into two columns, but handle missing commas.

let    Source = ... ,    AddLast = Table.AddColumn(        Source, "LastName",        each if Text.Contains([FullName], ",") then Text.BeforeDelimiter([FullName], ",") else null,        type text    ),    AddFirst = Table.AddColumn(        AddLast, "FirstName",        each if Text.Contains([FullName], ",") then Text.Trim(Text.AfterDelimiter([FullName], ",")) else Text.Trim([FullName]),        type text    )in    AddFirst

This avoids errors and makes your rule clear: if there is no comma, treat the full value as first name and leave last name null.

Date cleaning and standardization

The two big date risks: type and locale

Dates cause problems when they arrive as text, or when the same text can be interpreted differently by locale. For example, "03/04/2025" could be March 4 or April 3 depending on culture settings. A refresh on another machine or service can silently change results if you rely on implicit conversion.

Best practice: parse dates explicitly using a known culture (or a known format), and then set the column type to date or datetime.

Step-by-step: convert a text date column with a specific locale

Scenario: You receive OrderDateText as "dd/MM/yyyy" text (e.g., "31/01/2025").

UI approach:

Select OrderDateText.
Go to Transform > Data Type > Using Locale...
Choose Date and select a locale that matches the format (for dd/MM, often English (United Kingdom)).

M approach (explicit and portable):

let    Source = ... ,    Parsed = Table.TransformColumns(        Source,        {            {"OrderDateText", each try Date.FromText(_, "en-GB") otherwise null, type date}        }    ),    Renamed = Table.RenameColumns(Parsed, {{"OrderDateText", "OrderDate"}})in    Renamed

Using try ... otherwise null prevents refresh failures when a row contains an invalid date (like "31/02/2025").

Handling mixed date inputs (date, datetime, and text)

Sometimes a column contains a mix: some rows are true dates, some are datetimes, some are text. A robust pattern is to normalize everything to a datetime first, then derive date if needed.

let    Source = ... ,    ToDateTime = Table.TransformColumns(        Source,        {            {                "EventTime",                (v) =>                    if v = null then null                    else if Value.Is(v, type datetime) then v                    else if Value.Is(v, type date) then DateTime.From(v)                    else try DateTime.FromText(Text.From(v), "en-US") otherwise null,                type datetime            }        }    ),    AddEventDate = Table.AddColumn(ToDateTime, "EventDate", each if [EventTime] = null then null else Date.From([EventTime]), type date)in    AddEventDate

Adjust the culture in DateTime.FromText to match your source. If the source is ISO-like (2025-01-31 or 2025-01-31T10:15:00), parsing is typically safer across locales, but explicit is still better.

Fixing “Excel serial dates” and other numeric date encodings

Some systems export dates as numbers (e.g., Excel serial date where 1 = 1899-12-31 in Excel’s system, with known quirks). If you see values like 45292 that should be a date, convert using a base date.

let    Source = ... ,    FromSerial = Table.TransformColumns(        Source,        {            {"ShipDateSerial", each if _ = null then null else Date.AddDays(#date(1899, 12, 30), Number.From(_)), type date}        }    )in    FromSerial

The base #date(1899, 12, 30) matches Excel’s common serial system behavior.

Number cleaning and standardization

Typical numeric issues

Numbers stored as text: "1,234", "$99.95", "(120.50)"
Mixed decimal separators: "1.234,56" vs "1,234.56"
Percent strings: "12%"
Dash placeholders: "-" meaning missing, not negative
Spaces as thousand separators: "1 234"

Step-by-step: convert currency-like text to a number

Scenario: Column Amount contains values like "$1,234.50", " 99 ", "(120.00)", and "-".

Use a controlled conversion that:

Turns placeholders into null
Removes currency symbols and separators
Interprets parentheses as negative
Returns a numeric type even when conversion fails (null instead of error)

let    Source = ... ,    NormalizeAmount = Table.TransformColumns(        Source,        {            {                "Amount",                (v) =>                    let                        t0 = if v = null then null else Text.Trim(Text.From(v)),                        t1 = if t0 = null or t0 = "" or t0 = "-" or Text.Upper(t0) = "N/A" then null else t0,                        isParenNeg = if t1 = null then false else Text.StartsWith(t1, "(") and Text.EndsWith(t1, ")"),                        t2 = if t1 = null then null else if isParenNeg then Text.Middle(t1, 1, Text.Length(t1)-2) else t1,                        t3 = if t2 = null then null else Text.ReplaceAny(t2, {"$","£","€"}, ""),                        t4 = if t3 = null then null else Text.Replace(t3, ",", ""),                        n = if t4 = null then null else try Number.FromText(t4, "en-US") otherwise null,                        n2 = if n = null then null else if isParenNeg then -n else n                    in                        n2,                type number            }        }    )in    NormalizeAmount

If your source uses European formatting (comma decimal), change the culture and separator logic. For example, for "1.234,56", you might remove dots as thousand separators and parse with "de-DE".

Parsing percentages into decimals

Decide your standard: store percentages as 0.12 (decimal fraction) or 12 (percentage points). Power BI and Excel often expect 0.12 with a percentage format applied later.

let    Source = ... ,    NormalizePct = Table.TransformColumns(        Source,        {            {                "DiscountPct",                (v) =>                    let                        t = if v = null then null else Text.Trim(Text.From(v)),                        t2 = if t = null or t = "" then null else t,                        hasPct = if t2 = null then false else Text.EndsWith(t2, "%"),                        numText = if t2 = null then null else if hasPct then Text.BeforeDelimiter(t2, "%") else t2,                        n = if numText = null then null else try Number.FromText(numText, "en-US") otherwise null,                        result = if n = null then null else if hasPct then n / 100 else n                    in                        result,                type number            }        }    )in    NormalizePct

Rounding and numeric consistency

Binary floating-point can introduce tiny rounding artifacts (e.g., 0.30000000000000004). If you need stable results for grouping or comparisons, round to a defined precision after conversion.

let    Source = ... ,    Rounded = Table.TransformColumns(Source, {{"Amount", each if _ = null then null else Number.Round(_, 2), type number}})in    Rounded

Use rounding intentionally; do not round identifiers or values where precision matters (like scientific measurements) unless the business rule says so.

Null handling and missing-value strategy

Understand the difference: null vs empty vs zero

null means “unknown or missing.” An empty string "" is a known value (a blank text). Zero is a numeric value. Mixing these causes incorrect aggregations and misleading counts.

Before you replace nulls, decide the rule per column:

Identifiers (CustomerID, OrderID): null usually indicates a data quality issue; keep null and flag it.
Measures (Quantity, Amount): null might mean missing; sometimes you want to treat missing as 0, but only if the business meaning is “none” rather than “unknown.”
Dimensions (Region, Category): null might be replaced with "Unknown" for reporting, but keep a separate flag if needed.

Step-by-step: standardize “missing markers” into null

Many sources use placeholders like "N/A", "NA", "null", "-", "(blank)". Convert these to true nulls early so type conversions behave.

let    Source = ... ,    MissingMarkers = {"", "-", "N/A", "NA", "NULL", "(blank)"},    ToNull = Table.TransformColumns(        Source,        {            {                "Status",                (v) =>                    let                        t = if v = null then null else Text.Upper(Text.Trim(Text.From(v)))                    in                        if t = null then null else if List.Contains(MissingMarkers, t) then null else Text.From(v),                type text            }        }    )in    ToNull

Tip: apply marker-to-null conversion before changing data types for dates and numbers. It reduces conversion errors.

Replacing nulls with defaults (only when appropriate)

When a default is required for reporting, do it explicitly and document the rule in the step name. Examples:

Replace null Region with "Unknown"
Replace null Quantity with 0 if missing means “no units”

let    Source = ... ,    RegionDefault = Table.TransformColumns(Source, {{"Region", each if _ = null then "Unknown" else _, type text}}),    QtyDefault = Table.TransformColumns(RegionDefault, {{"Quantity", each if _ = null then 0 else _, Int64.Type}})in    QtyDefault

Forward fill and backward fill (Fill Down/Up) with care

Fill Down is useful when a source uses a “header row” pattern where a category appears once and subsequent rows are blank until the next category. This is not the same as imputing missing values in transactional data.

UI approach:

Select the column to fill.
Use Transform > Fill > Down (or Up).

Best practice: sort first to ensure the fill direction matches the intended grouping. If the source order can change, add an index column before sorting so you can restore original order after the fill.

Detecting and isolating errors vs nulls

Errors are different from nulls. Errors stop refresh if not handled. A robust cleaning workflow often includes:

Convert known missing markers to null
Use try ... otherwise null for risky conversions
Add a quality flag column to track rows that failed conversion

let    Source = ... ,    AddParsedDate = Table.AddColumn(        Source,        "OrderDate_Parsed",        each try Date.FromText([OrderDateText], "en-GB") otherwise null,        type date    ),    AddDateFlag = Table.AddColumn(        AddParsedDate,        "OrderDate_ParseFailed",        each [OrderDateText] <> null and Text.Trim(Text.From([OrderDateText])) <> "" and [OrderDate_Parsed] = null,        type logical    )in    AddDateFlag

This keeps the dataset refreshable while still exposing data quality issues for remediation.

Putting it together: a repeatable “standardize columns” pattern

Design a column-by-column standardization checklist

For each important column, define:

Target data type (text/date/datetime/number/whole number)
Allowed values or format (e.g., ISO date, Proper Case, uppercase codes)
Missing-value rule (keep null, replace with default, or flag)
Conversion method and culture (especially for dates and numbers)

Then implement transformations in a consistent order:

Normalize missing markers to null
Clean/trim text
Parse dates and numbers with explicit culture
Set final data types
Add flags for failed parsing or unexpected values

Example: standardize multiple columns in one step

This example shows a compact approach using Table.TransformColumns for several fields. Adapt the cultures and rules to your data.

let    Source = ... ,    Standardized = Table.TransformColumns(        Source,        {            {"CustomerName", (v)=> let t = if v=null then null else Text.Clean(Text.Trim(Text.Replace(Text.From(v), Character.FromNumber(160), " "))) in if t="" then null else Text.Proper(t), type text},            {"OrderDate", (v)=> if v=null then null else if Value.Is(v, type date) then v else try Date.FromText(Text.From(v), "en-GB") otherwise null, type date},            {"Amount", (v)=> let t = if v=null then null else Text.Trim(Text.From(v)) in if t=null or t="" or t="-" then null else try Number.FromText(Text.Replace(Text.Replace(t, "$", ""), ",", ""), "en-US") otherwise null, type number},            {"Region", (v)=> let t = if v=null then null else Text.Trim(Text.From(v)) in if t=null or t="" then "Unknown" else t, type text}        }    )in    Standardized

Even if you later break this into smaller steps for readability, this pattern demonstrates the key idea: each column has a deterministic transformation function, explicit type, and explicit parsing behavior.

Practical checks after standardization

After cleaning, validate quickly inside Power Query:

Use column profiling (when available) to spot unexpected distributions and null spikes.
Filter for parse-failed flags (logical columns you added) to see problematic rows.
Group by standardized columns (like Region or Country) to confirm mapping worked.
Sort and scan for outliers (very large amounts, dates far in the past/future).

These checks are part of building a repeatable pipeline: you are not just transforming; you are making the transformation trustworthy under refresh.

Now answer the exercise about the content:

When converting a text date column that could be interpreted differently by locale, which approach best ensures the results stay consistent across refreshes?

You are right! Congratulations, now go to the next page

You missed! Try again.

Explicitly parsing with a defined culture or format avoids silent locale-based reinterpretation (for example, dd/MM vs MM/dd) and makes refresh behavior deterministic before setting the column type.