Why cleaning and standardizing matters in Power Query
In Power Query, “cleaning” means removing inconsistencies and errors so the data behaves predictably. “Standardizing” means enforcing one agreed format for each field (text casing, date type, numeric type, missing-value rules, and consistent codes). The goal is not to make the data look nice in a worksheet; it is to make downstream steps (joins, grouping, measures, and refreshes) reliable.
Most issues fall into four buckets: text inconsistencies (extra spaces, mixed casing, hidden characters), date inconsistencies (text dates, mixed locales, invalid dates), numeric inconsistencies (numbers stored as text, thousand separators, currency symbols, negative formats), and null handling (blanks, “N/A”, “-”, and other placeholders that should become null or a default).
Power Query gives you two complementary approaches: (1) UI transformations (good for discoverability and quick iteration) and (2) explicit M code (good for precision, reuse, and avoiding ambiguous locale behavior). In practice, you often start with UI steps and then refine the generated M to make it deterministic.
Text cleaning and standardization
Common text problems you should expect
- Leading/trailing spaces:
" Acme"vs"Acme" - Multiple internal spaces:
"Acme Corp" - Non-printing characters: tabs, line breaks, non-breaking spaces
- Inconsistent casing:
"john SMITH","John Smith" - Inconsistent punctuation:
"O'Reilly"vs"OReilly" - Mixed codes:
"us","USA","United States" - Empty strings vs nulls:
""is not the same asnull
Step-by-step: normalize a text column (trim, clean, case, and null rules)
Scenario: You have a CustomerName column coming from multiple sources. You want to remove extra spaces, remove non-printing characters, standardize to Proper Case, and convert empty strings to null.
In the Power Query Editor:
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
- Select
CustomerName. - Use Transform > Format > Trim to remove leading/trailing spaces.
- Use Transform > Format > Clean to remove non-printing characters.
- Optionally use Transform > Format > Capitalize Each Word (Proper Case) if that matches your standard.
- Convert empty strings to null using Transform > Replace Values: replace
""withnull. If the UI does not allow an empty search value easily, do it in M.
Refine in M for repeatability and to handle edge cases (like non-breaking spaces). Example M pattern:
let Source = ... , NormalizeCustomer = Table.TransformColumns( Source, { { "CustomerName", (t) => let t0 = if t = null then null else Text.From(t), t1 = if t0 = null then null else Text.Replace(t0, Character.FromNumber(160), " "), t2 = if t1 = null then null else Text.Clean(Text.Trim(t1)), t3 = if t2 = "" then null else Text.Proper(t2) in t3, type text } } )in NormalizeCustomerNotes:
Character.FromNumber(160)targets the common non-breaking space that Trim may not remove.Text.Fromensures non-text values don’t break the transformation.- Setting
type textinsideTransformColumnshelps keep types stable.
Standardizing codes with mapping tables (recommended)
When you need to standardize values like country codes, product categories, or department names, avoid long chains of “Replace Values” steps. A mapping table is easier to maintain and audit.
Approach:
- Create a small mapping table with columns like
RawValueandStandardValue. - Merge your main table to the mapping table on the raw column.
- Use the mapped
StandardValuewhen available; otherwise keep the original (or set to null).
Example M pattern (assuming you already have a mapping query named MapCountry):
let Source = ... , Merged = Table.NestedJoin(Source, {"Country"}, MapCountry, {"RawValue"}, "Map", JoinKind.LeftOuter), Expanded = Table.ExpandTableColumn(Merged, "Map", {"StandardValue"}, {"Country_Standard"}), Final = Table.AddColumn( Expanded, "Country_Final", each if [Country_Standard] <> null then [Country_Standard] else [Country], type text ), Cleanup = Table.RemoveColumns(Final, {"Country_Standard"})in CleanupThis makes standardization explicit and maintainable. You can also add an “Unmapped” check by filtering where Country_Standard is null to see which raw values need mapping.
Splitting, extracting, and recombining text safely
Text often contains embedded structure: “Last, First”, “City - State”, “SKU-Color-Size”. Power Query has UI tools like Split Column and Extract. The key is to decide what to do when the delimiter is missing or appears multiple times.
Example: Split “Last, First” into two columns, but handle missing commas.
let Source = ... , AddLast = Table.AddColumn( Source, "LastName", each if Text.Contains([FullName], ",") then Text.BeforeDelimiter([FullName], ",") else null, type text ), AddFirst = Table.AddColumn( AddLast, "FirstName", each if Text.Contains([FullName], ",") then Text.Trim(Text.AfterDelimiter([FullName], ",")) else Text.Trim([FullName]), type text )in AddFirstThis avoids errors and makes your rule clear: if there is no comma, treat the full value as first name and leave last name null.
Date cleaning and standardization
The two big date risks: type and locale
Dates cause problems when they arrive as text, or when the same text can be interpreted differently by locale. For example, "03/04/2025" could be March 4 or April 3 depending on culture settings. A refresh on another machine or service can silently change results if you rely on implicit conversion.
Best practice: parse dates explicitly using a known culture (or a known format), and then set the column type to date or datetime.
Step-by-step: convert a text date column with a specific locale
Scenario: You receive OrderDateText as "dd/MM/yyyy" text (e.g., "31/01/2025").
UI approach:
- Select
OrderDateText. - Go to Transform > Data Type > Using Locale...
- Choose Date and select a locale that matches the format (for
dd/MM, oftenEnglish (United Kingdom)).
M approach (explicit and portable):
let Source = ... , Parsed = Table.TransformColumns( Source, { {"OrderDateText", each try Date.FromText(_, "en-GB") otherwise null, type date} } ), Renamed = Table.RenameColumns(Parsed, {{"OrderDateText", "OrderDate"}})in RenamedUsing try ... otherwise null prevents refresh failures when a row contains an invalid date (like "31/02/2025").
Handling mixed date inputs (date, datetime, and text)
Sometimes a column contains a mix: some rows are true dates, some are datetimes, some are text. A robust pattern is to normalize everything to a datetime first, then derive date if needed.
let Source = ... , ToDateTime = Table.TransformColumns( Source, { { "EventTime", (v) => if v = null then null else if Value.Is(v, type datetime) then v else if Value.Is(v, type date) then DateTime.From(v) else try DateTime.FromText(Text.From(v), "en-US") otherwise null, type datetime } } ), AddEventDate = Table.AddColumn(ToDateTime, "EventDate", each if [EventTime] = null then null else Date.From([EventTime]), type date)in AddEventDateAdjust the culture in DateTime.FromText to match your source. If the source is ISO-like (2025-01-31 or 2025-01-31T10:15:00), parsing is typically safer across locales, but explicit is still better.
Fixing “Excel serial dates” and other numeric date encodings
Some systems export dates as numbers (e.g., Excel serial date where 1 = 1899-12-31 in Excel’s system, with known quirks). If you see values like 45292 that should be a date, convert using a base date.
let Source = ... , FromSerial = Table.TransformColumns( Source, { {"ShipDateSerial", each if _ = null then null else Date.AddDays(#date(1899, 12, 30), Number.From(_)), type date} } )in FromSerialThe base #date(1899, 12, 30) matches Excel’s common serial system behavior.
Number cleaning and standardization
Typical numeric issues
- Numbers stored as text:
"1,234","$99.95","(120.50)" - Mixed decimal separators:
"1.234,56"vs"1,234.56" - Percent strings:
"12%" - Dash placeholders:
"-"meaning missing, not negative - Spaces as thousand separators:
"1 234"
Step-by-step: convert currency-like text to a number
Scenario: Column Amount contains values like "$1,234.50", " 99 ", "(120.00)", and "-".
Use a controlled conversion that:
- Turns placeholders into null
- Removes currency symbols and separators
- Interprets parentheses as negative
- Returns a numeric type even when conversion fails (null instead of error)
let Source = ... , NormalizeAmount = Table.TransformColumns( Source, { { "Amount", (v) => let t0 = if v = null then null else Text.Trim(Text.From(v)), t1 = if t0 = null or t0 = "" or t0 = "-" or Text.Upper(t0) = "N/A" then null else t0, isParenNeg = if t1 = null then false else Text.StartsWith(t1, "(") and Text.EndsWith(t1, ")"), t2 = if t1 = null then null else if isParenNeg then Text.Middle(t1, 1, Text.Length(t1)-2) else t1, t3 = if t2 = null then null else Text.ReplaceAny(t2, {"$","£","€"}, ""), t4 = if t3 = null then null else Text.Replace(t3, ",", ""), n = if t4 = null then null else try Number.FromText(t4, "en-US") otherwise null, n2 = if n = null then null else if isParenNeg then -n else n in n2, type number } } )in NormalizeAmountIf your source uses European formatting (comma decimal), change the culture and separator logic. For example, for "1.234,56", you might remove dots as thousand separators and parse with "de-DE".
Parsing percentages into decimals
Decide your standard: store percentages as 0.12 (decimal fraction) or 12 (percentage points). Power BI and Excel often expect 0.12 with a percentage format applied later.
let Source = ... , NormalizePct = Table.TransformColumns( Source, { { "DiscountPct", (v) => let t = if v = null then null else Text.Trim(Text.From(v)), t2 = if t = null or t = "" then null else t, hasPct = if t2 = null then false else Text.EndsWith(t2, "%"), numText = if t2 = null then null else if hasPct then Text.BeforeDelimiter(t2, "%") else t2, n = if numText = null then null else try Number.FromText(numText, "en-US") otherwise null, result = if n = null then null else if hasPct then n / 100 else n in result, type number } } )in NormalizePctRounding and numeric consistency
Binary floating-point can introduce tiny rounding artifacts (e.g., 0.30000000000000004). If you need stable results for grouping or comparisons, round to a defined precision after conversion.
let Source = ... , Rounded = Table.TransformColumns(Source, {{"Amount", each if _ = null then null else Number.Round(_, 2), type number}})in RoundedUse rounding intentionally; do not round identifiers or values where precision matters (like scientific measurements) unless the business rule says so.
Null handling and missing-value strategy
Understand the difference: null vs empty vs zero
null means “unknown or missing.” An empty string "" is a known value (a blank text). Zero is a numeric value. Mixing these causes incorrect aggregations and misleading counts.
Before you replace nulls, decide the rule per column:
- Identifiers (CustomerID, OrderID): null usually indicates a data quality issue; keep null and flag it.
- Measures (Quantity, Amount): null might mean missing; sometimes you want to treat missing as 0, but only if the business meaning is “none” rather than “unknown.”
- Dimensions (Region, Category): null might be replaced with
"Unknown"for reporting, but keep a separate flag if needed.
Step-by-step: standardize “missing markers” into null
Many sources use placeholders like "N/A", "NA", "null", "-", "(blank)". Convert these to true nulls early so type conversions behave.
let Source = ... , MissingMarkers = {"", "-", "N/A", "NA", "NULL", "(blank)"}, ToNull = Table.TransformColumns( Source, { { "Status", (v) => let t = if v = null then null else Text.Upper(Text.Trim(Text.From(v))) in if t = null then null else if List.Contains(MissingMarkers, t) then null else Text.From(v), type text } } )in ToNullTip: apply marker-to-null conversion before changing data types for dates and numbers. It reduces conversion errors.
Replacing nulls with defaults (only when appropriate)
When a default is required for reporting, do it explicitly and document the rule in the step name. Examples:
- Replace null Region with
"Unknown" - Replace null Quantity with 0 if missing means “no units”
let Source = ... , RegionDefault = Table.TransformColumns(Source, {{"Region", each if _ = null then "Unknown" else _, type text}}), QtyDefault = Table.TransformColumns(RegionDefault, {{"Quantity", each if _ = null then 0 else _, Int64.Type}})in QtyDefaultForward fill and backward fill (Fill Down/Up) with care
Fill Down is useful when a source uses a “header row” pattern where a category appears once and subsequent rows are blank until the next category. This is not the same as imputing missing values in transactional data.
UI approach:
- Select the column to fill.
- Use Transform > Fill > Down (or Up).
Best practice: sort first to ensure the fill direction matches the intended grouping. If the source order can change, add an index column before sorting so you can restore original order after the fill.
Detecting and isolating errors vs nulls
Errors are different from nulls. Errors stop refresh if not handled. A robust cleaning workflow often includes:
- Convert known missing markers to null
- Use
try ... otherwise nullfor risky conversions - Add a quality flag column to track rows that failed conversion
let Source = ... , AddParsedDate = Table.AddColumn( Source, "OrderDate_Parsed", each try Date.FromText([OrderDateText], "en-GB") otherwise null, type date ), AddDateFlag = Table.AddColumn( AddParsedDate, "OrderDate_ParseFailed", each [OrderDateText] <> null and Text.Trim(Text.From([OrderDateText])) <> "" and [OrderDate_Parsed] = null, type logical )in AddDateFlagThis keeps the dataset refreshable while still exposing data quality issues for remediation.
Putting it together: a repeatable “standardize columns” pattern
Design a column-by-column standardization checklist
For each important column, define:
- Target data type (text/date/datetime/number/whole number)
- Allowed values or format (e.g., ISO date, Proper Case, uppercase codes)
- Missing-value rule (keep null, replace with default, or flag)
- Conversion method and culture (especially for dates and numbers)
Then implement transformations in a consistent order:
- Normalize missing markers to null
- Clean/trim text
- Parse dates and numbers with explicit culture
- Set final data types
- Add flags for failed parsing or unexpected values
Example: standardize multiple columns in one step
This example shows a compact approach using Table.TransformColumns for several fields. Adapt the cultures and rules to your data.
let Source = ... , Standardized = Table.TransformColumns( Source, { {"CustomerName", (v)=> let t = if v=null then null else Text.Clean(Text.Trim(Text.Replace(Text.From(v), Character.FromNumber(160), " "))) in if t="" then null else Text.Proper(t), type text}, {"OrderDate", (v)=> if v=null then null else if Value.Is(v, type date) then v else try Date.FromText(Text.From(v), "en-GB") otherwise null, type date}, {"Amount", (v)=> let t = if v=null then null else Text.Trim(Text.From(v)) in if t=null or t="" or t="-" then null else try Number.FromText(Text.Replace(Text.Replace(t, "$", ""), ",", ""), "en-US") otherwise null, type number}, {"Region", (v)=> let t = if v=null then null else Text.Trim(Text.From(v)) in if t=null or t="" then "Unknown" else t, type text} } )in StandardizedEven if you later break this into smaller steps for readability, this pattern demonstrates the key idea: each column has a deterministic transformation function, explicit type, and explicit parsing behavior.
Practical checks after standardization
After cleaning, validate quickly inside Power Query:
- Use column profiling (when available) to spot unexpected distributions and null spikes.
- Filter for parse-failed flags (logical columns you added) to see problematic rows.
- Group by standardized columns (like Region or Country) to confirm mapping worked.
- Sort and scan for outliers (very large amounts, dates far in the past/future).
These checks are part of building a repeatable pipeline: you are not just transforming; you are making the transformation trustworthy under refresh.