Why “combine files” is more than just appending
When you receive the same type of report every day or every month, the goal is rarely to “stack everything as-is.” The goal is to produce one standardized table that downstream steps (pivot tables, measures, dashboards) can rely on. In practice, files that are supposed to be identical often drift: column names change slightly, extra columns appear, a header row shifts, data types vary, or a new sheet name is introduced. Combining multiple files into a single standardized table means you design a repeatable transformation that (1) ingests each file, (2) normalizes it to a target schema, and (3) appends all normalized outputs into one table.
This chapter focuses on the standardization part: how to build a robust “per-file” transformation and then combine the results safely, even when the incoming files are imperfect. You will work with a target schema (the columns you want, with consistent names and data types) and a set of defensive steps that handle common variations.
Define the target schema before you touch Power Query
Standardization is easiest when you decide what “correct” looks like. Create a short specification for your final table:
Column list: the exact columns you want in the final table (e.g., Date, CustomerID, CustomerName, Product, Quantity, UnitPrice, Currency, SourceFile).
Data types: date, whole number, decimal, text, etc.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Optional vs required: which columns must exist in every file and which can be missing.
Business rules: e.g., Quantity must be positive; UnitPrice can be blank; Currency defaults to “USD” if missing.
Power Query works best when you implement this schema explicitly. If you rely on “whatever columns happen to be there,” your model will break the first time a file changes.
Pattern: build a per-file transformation function
The most reliable way to combine many files is to create a transformation that takes a single file’s binary content (or a table extracted from it) and returns a standardized table. Then you apply that transformation to every file and append the results.
Even if you use the built-in Combine Files experience, the underlying idea is the same: a sample file is used to generate a “Transform Sample File” query. You can improve that query so it enforces your schema and handles variations. The key is to treat that query as a reusable “function-like” transformation.
What the per-file transformation should do
Extract the relevant table from the file (sheet/table/CSV content).
Clean structural issues (promote headers, remove top rows, filter out totals rows).
Normalize columns (rename to target names, add missing columns, remove extras).
Set data types consistently.
Add metadata like SourceFile, SourceFolder, or ReportMonth.
Step-by-step: standardize columns across drifting files
The steps below assume you already have a query that produces a list of files (typically from a folder) and that you can access each file’s content. The focus here is on the transformation logic that makes the combined output stable.
Step 1: Extract the table from each file consistently
For Excel files, you typically choose between extracting from a named table, a sheet, or a range. For standardization, a named table is the most stable because it survives row insertions and usually keeps headers consistent. If you must use a sheet, be prepared to handle extra header rows and blank rows.
In M, extraction often looks like:
let Source = Excel.Workbook(FileContents, true), Data = Source{[Item="Sales",Kind="Table"]}[Data]in DataIf the “Sales” table might be missing or renamed, you can add a fallback strategy (for example, pick the first table, or pick a sheet by name). Keep fallback logic conservative: it’s better to fail fast than silently combine the wrong data.
Step 2: Promote headers and remove non-data rows defensively
Common issues:
One file has an extra title row above the headers.
Headers are already promoted in some files but not others.
There are blank rows or “Totals” rows at the bottom.
A practical approach is to standardize by detecting the header row. If your header row contains known column names (like “Date” or “Customer”), you can search for the row index where those appear, then skip rows above it. If that’s too complex, use a simpler rule: remove top rows until the first row has no nulls in key columns, then promote headers.
Example pattern (simplified) to remove completely blank rows and then promote headers:
let CleanRows = Table.SelectRows(Data, each List.NonNullCount(Record.FieldValues(_)) > 0), Promoted = Table.PromoteHeaders(CleanRows, [PromoteAllScalars=true])in PromotedAfter promoting headers, filter out obvious totals rows (e.g., where Date is null and CustomerName equals “Total”).
Step 3: Normalize column names using a mapping table
Column drift often shows up as small naming differences: “Cust ID” vs “CustomerID”, “Unit Price” vs “Price”, “Qty” vs “Quantity”. Hard-coding rename steps works until the next variation appears. A more scalable approach is to maintain a mapping table (in Excel or in Power Query) that lists:
IncomingName (as found in files)
StandardName (your target schema name)
Then apply that mapping to rename columns. In Power Query, you can turn the mapping into a list of rename pairs and apply Table.RenameColumns.
let // Mapping table with columns [IncomingName], [StandardName] RenamePairs = List.Transform(Table.ToRows(Mapping), each { _{0}, _{1} }), Renamed = Table.RenameColumns(Promoted, RenamePairs, MissingField.Ignore)in RenamedMissingField.Ignore prevents errors when a particular incoming name isn’t present in a given file.
Step 4: Add missing columns (and set defaults)
To guarantee a stable schema, add any missing columns with nulls or default values. This is crucial because appending tables with different column sets leads to unpredictable results or extra columns you didn’t plan for.
Define your target column list and ensure each exists:
let TargetColumns = {"Date","CustomerID","CustomerName","Product","Quantity","UnitPrice","Currency"}, WithAllColumns = List.Accumulate( TargetColumns, Renamed, (state, col) => if Table.HasColumns(state, col) then state else Table.AddColumn(state, col, each null) )in WithAllColumnsIf you want defaults (for example Currency = “USD” when missing), add the column with that default, or fill nulls after adding:
let AddedCurrency = if Table.HasColumns(WithAllColumns, "Currency") then WithAllColumns else Table.AddColumn(WithAllColumns, "Currency", each "USD"), CurrencyFilled = Table.ReplaceValue(AddedCurrency, null, "USD", Replacer.ReplaceValue, {"Currency"})in CurrencyFilledStep 5: Remove extra columns to lock the schema
Files often include extra columns like “Notes”, “Last Updated”, or “Region (old)”. If you allow them through, your final table becomes unstable. After renaming and adding missing columns, explicitly select only the target columns (plus any metadata columns you add).
let Selected = Table.SelectColumns(CurrencyFilled, TargetColumns, MissingField.UseNull)in SelectedMissingField.UseNull ensures that if a column is still missing for some reason, it will be created as null rather than erroring.
Step 6: Enforce data types carefully (and avoid type errors)
Type enforcement is necessary, but it’s also a common source of refresh failures when one file contains unexpected values (like “N/A” in a numeric column). A robust approach is:
Convert using locale-aware parsing if needed.
Replace known non-numeric placeholders with null before changing type.
Use
try ... otherwisefor conversions when data is messy.
Example: safely convert Quantity and UnitPrice to numbers:
let CleanQty = Table.TransformColumns(Selected, { {"Quantity", each try Number.From(_) otherwise null, type number}, {"UnitPrice", each try Number.From(_) otherwise null, type number} }), Typed = Table.TransformColumnTypes(CleanQty, { {"Date", type date}, {"CustomerID", type text}, {"CustomerName", type text}, {"Product", type text}, {"Currency", type text} })in TypedNote that setting a column’s type in the same step as a try-based transform helps keep the schema consistent even when values are invalid.
Step 7: Add SourceFile metadata for traceability
When combining many files, you need a way to trace bad rows back to their origin. Add a column like SourceFile (and optionally SourceFolder or ModifiedDate). This metadata usually comes from the file list query, not from inside the file itself.
When you apply your per-file transformation to each file, include the file name as an argument or add it afterward. Example pattern after you have a table of files with columns like [Name] and [Content]: create a custom column that returns the standardized table, then expand it, and keep [Name] as SourceFile.
In M, the idea looks like:
let Added = Table.AddColumn(Files, "Data", each TransformFile([Content])), Expanded = Table.ExpandTableColumn(Added, "Data", {"Date","CustomerID","CustomerName","Product","Quantity","UnitPrice","Currency"}), WithSource = Table.RenameColumns(Expanded, {{"Name","SourceFile"}})in WithSourceThis makes troubleshooting dramatically easier: filter SourceFile to isolate the problematic report.
Handling common real-world variations
Variation: different sheet or table names
If some files use a different table name (e.g., “SalesData” instead of “Sales”), you can implement a small selection routine: try preferred names in order, otherwise error. Keep the list short and explicit so you don’t accidentally pick the wrong object.
let Source = Excel.Workbook(FileContents, true), Candidates = {"Sales","SalesData","Report"}, Pick = List.First( List.RemoveNulls( List.Transform(Candidates, (n) => try Source{[Item=n,Kind="Table"]}[Data] otherwise null) ) ), Data = if Pick = null then error "No expected table found" else Pickin DataVariation: columns present but with different casing or trailing spaces
Normalize column names early by trimming and standardizing case. You can transform the column names themselves:
let NormalizedNames = Table.TransformColumnNames(Promoted, each Text.Upper(Text.Trim(_)))in NormalizedNamesThen your mapping table should use the same normalization (e.g., uppercase incoming names) so renames match reliably.
Variation: one file includes a subtotal section
Subtotal rows often have blanks in key fields or contain labels like “Subtotal” in a product column. Filter them out using rules tied to required fields. For example, if Date and CustomerID must exist for a valid row:
let Filtered = Table.SelectRows(Typed, each [Date] <> null and [CustomerID] <> null)in FilteredIf subtotals still pass that test, add a second rule (e.g., Quantity not null, or Product not equal to “Subtotal”).
Variation: mixed decimal separators and locales
If some files use commas as decimal separators, Number.From may misinterpret values depending on your system locale. In those cases, parse using Number.FromText with a specified culture. Example for German-style numbers:
let Parsed = Table.TransformColumns(Selected, { {"UnitPrice", each try Number.FromText(Text.From(_), "de-DE") otherwise null, type number} })in ParsedApply this only when you know the source locale; otherwise, you can implement a conditional parse (for example, detect whether the text contains a comma and a dot pattern).
Practical workflow: build, test, then scale
Use a small test set of files
Before you combine hundreds of files, test your standardization logic on a handful that represent the range of variations: an “old format” file, a “new format” file, one with missing columns, and one with messy values. Your goal is to make the per-file transformation return the same column set and types for all of them.
Validate the schema after combining
After you expand and append all files, add quick checks:
Count rows per SourceFile to spot zero-row files.
Check for nulls in required fields (Date, CustomerID, etc.).
Check for unexpected columns (there should be none if you used explicit selection).
One simple technique is to create a separate “audit” query that groups by SourceFile and counts rows and nulls. This doesn’t change your main query; it gives you a monitoring view.
let Audit = Table.Group(Combined, {"SourceFile"}, { {"Rows", each Table.RowCount(_), Int64.Type}, {"NullCustomerID", each List.Count(List.Select([CustomerID], each _ = null)), Int64.Type} })in AuditThis kind of audit helps you detect when a new file format arrives and breaks assumptions, without manually inspecting the combined table.
Design choices that keep the combined table stable
Prefer “select then type” over “type then select”
If you change types before you lock the schema, you may end up typing columns that later get renamed or removed. A stable order is: rename columns, add missing columns, select target columns, then set types. This ensures the final table always has the same structure before type enforcement.
Keep transformation steps resilient to missing fields
Use options like MissingField.Ignore and MissingField.UseNull where appropriate. This prevents refresh failures when a column is absent in one file, while still enforcing your final schema by adding the column back as null.
Separate “extraction” from “standardization”
Extraction is the part that reads the file and finds the right sheet/table. Standardization is the part that renames, adds/removes columns, and types. Keeping these as distinct sections (or even separate queries/functions) makes maintenance easier: if the sheet name changes, you adjust extraction; if a new column appears, you adjust the schema logic.
Example: a complete per-file standardization function (template)
The following is a template you can adapt. It assumes you pass in the file binary and return a standardized table. Replace the extraction logic (table name, sheet name, CSV parsing) to match your files.
let TransformFile = (FileContents as binary) as table => let Source = Excel.Workbook(FileContents, true), Raw = Source{[Item="Sales",Kind="Table"]}[Data], NonBlank = Table.SelectRows(Raw, each List.NonNullCount(Record.FieldValues(_)) > 0), Promoted = Table.PromoteHeaders(NonBlank, [PromoteAllScalars=true]), // Normalize column name text NormNames = Table.TransformColumnNames(Promoted, each Text.Trim(_)), // Rename using a mapping table query named ColumnMap with [IncomingName],[StandardName] RenamePairs = List.Transform(Table.ToRows(ColumnMap), each { _{0}, _{1} }), Renamed = Table.RenameColumns(NormNames, RenamePairs, MissingField.Ignore), TargetColumns = {"Date","CustomerID","CustomerName","Product","Quantity","UnitPrice","Currency"}, WithAll = List.Accumulate( TargetColumns, Renamed, (state, col) => if Table.HasColumns(state, col) then state else Table.AddColumn(state, col, each null) ), Selected = Table.SelectColumns(WithAll, TargetColumns, MissingField.UseNull), CleanNumbers = Table.TransformColumns(Selected, { {"Quantity", each try Number.From(_) otherwise null, type number}, {"UnitPrice", each try Number.From(_) otherwise null, type number} }), Typed = Table.TransformColumnTypes(CleanNumbers, { {"Date", type date}, {"CustomerID", type text}, {"CustomerName", type text}, {"Product", type text}, {"Currency", type text} }) in Typedin TransformFileOnce you have this function, the combine process becomes straightforward: apply it to each file’s content, expand, and add SourceFile. The strength is that every file is forced through the same schema gate.
Practical checklist for a reliable combined table
Write down the target schema (columns + types) and enforce it explicitly.
Rename columns via a mapping table to handle synonyms and drift.
Add missing columns before appending; remove extra columns after renaming.
Use
try ... otherwisefor numeric/date conversions when files contain placeholders.Add SourceFile metadata so you can trace issues quickly.
Create an audit query (row counts, null checks) to detect format changes early.