Free Ebook cover Excel Power Query Playbook: Repeatable Data Prep Without VBA

Excel Power Query Playbook: Repeatable Data Prep Without VBA

New course

10 pages

Combining Multiple Files into a Single Standardized Table

Capítulo 3

Estimated reading time: 12 minutes

+ Exercise

Why “combine files” is more than just appending

When you receive the same type of report every day or every month, the goal is rarely to “stack everything as-is.” The goal is to produce one standardized table that downstream steps (pivot tables, measures, dashboards) can rely on. In practice, files that are supposed to be identical often drift: column names change slightly, extra columns appear, a header row shifts, data types vary, or a new sheet name is introduced. Combining multiple files into a single standardized table means you design a repeatable transformation that (1) ingests each file, (2) normalizes it to a target schema, and (3) appends all normalized outputs into one table.

This chapter focuses on the standardization part: how to build a robust “per-file” transformation and then combine the results safely, even when the incoming files are imperfect. You will work with a target schema (the columns you want, with consistent names and data types) and a set of defensive steps that handle common variations.

Define the target schema before you touch Power Query

Standardization is easiest when you decide what “correct” looks like. Create a short specification for your final table:

  • Column list: the exact columns you want in the final table (e.g., Date, CustomerID, CustomerName, Product, Quantity, UnitPrice, Currency, SourceFile).

  • Data types: date, whole number, decimal, text, etc.

    Continue in our app.

    You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

    Or continue reading below...
    Download App

    Download the app

  • Optional vs required: which columns must exist in every file and which can be missing.

  • Business rules: e.g., Quantity must be positive; UnitPrice can be blank; Currency defaults to “USD” if missing.

Power Query works best when you implement this schema explicitly. If you rely on “whatever columns happen to be there,” your model will break the first time a file changes.

Pattern: build a per-file transformation function

The most reliable way to combine many files is to create a transformation that takes a single file’s binary content (or a table extracted from it) and returns a standardized table. Then you apply that transformation to every file and append the results.

Even if you use the built-in Combine Files experience, the underlying idea is the same: a sample file is used to generate a “Transform Sample File” query. You can improve that query so it enforces your schema and handles variations. The key is to treat that query as a reusable “function-like” transformation.

What the per-file transformation should do

  • Extract the relevant table from the file (sheet/table/CSV content).

  • Clean structural issues (promote headers, remove top rows, filter out totals rows).

  • Normalize columns (rename to target names, add missing columns, remove extras).

  • Set data types consistently.

  • Add metadata like SourceFile, SourceFolder, or ReportMonth.

Step-by-step: standardize columns across drifting files

The steps below assume you already have a query that produces a list of files (typically from a folder) and that you can access each file’s content. The focus here is on the transformation logic that makes the combined output stable.

Step 1: Extract the table from each file consistently

For Excel files, you typically choose between extracting from a named table, a sheet, or a range. For standardization, a named table is the most stable because it survives row insertions and usually keeps headers consistent. If you must use a sheet, be prepared to handle extra header rows and blank rows.

In M, extraction often looks like:

let  Source = Excel.Workbook(FileContents, true),  Data = Source{[Item="Sales",Kind="Table"]}[Data]in  Data

If the “Sales” table might be missing or renamed, you can add a fallback strategy (for example, pick the first table, or pick a sheet by name). Keep fallback logic conservative: it’s better to fail fast than silently combine the wrong data.

Step 2: Promote headers and remove non-data rows defensively

Common issues:

  • One file has an extra title row above the headers.

  • Headers are already promoted in some files but not others.

  • There are blank rows or “Totals” rows at the bottom.

A practical approach is to standardize by detecting the header row. If your header row contains known column names (like “Date” or “Customer”), you can search for the row index where those appear, then skip rows above it. If that’s too complex, use a simpler rule: remove top rows until the first row has no nulls in key columns, then promote headers.

Example pattern (simplified) to remove completely blank rows and then promote headers:

let  CleanRows = Table.SelectRows(Data, each List.NonNullCount(Record.FieldValues(_)) > 0),  Promoted = Table.PromoteHeaders(CleanRows, [PromoteAllScalars=true])in  Promoted

After promoting headers, filter out obvious totals rows (e.g., where Date is null and CustomerName equals “Total”).

Step 3: Normalize column names using a mapping table

Column drift often shows up as small naming differences: “Cust ID” vs “CustomerID”, “Unit Price” vs “Price”, “Qty” vs “Quantity”. Hard-coding rename steps works until the next variation appears. A more scalable approach is to maintain a mapping table (in Excel or in Power Query) that lists:

  • IncomingName (as found in files)

  • StandardName (your target schema name)

Then apply that mapping to rename columns. In Power Query, you can turn the mapping into a list of rename pairs and apply Table.RenameColumns.

let  // Mapping table with columns [IncomingName], [StandardName]  RenamePairs = List.Transform(Table.ToRows(Mapping), each { _{0}, _{1} }),  Renamed = Table.RenameColumns(Promoted, RenamePairs, MissingField.Ignore)in  Renamed

MissingField.Ignore prevents errors when a particular incoming name isn’t present in a given file.

Step 4: Add missing columns (and set defaults)

To guarantee a stable schema, add any missing columns with nulls or default values. This is crucial because appending tables with different column sets leads to unpredictable results or extra columns you didn’t plan for.

Define your target column list and ensure each exists:

let  TargetColumns = {"Date","CustomerID","CustomerName","Product","Quantity","UnitPrice","Currency"},  WithAllColumns = List.Accumulate(    TargetColumns,    Renamed,    (state, col) => if Table.HasColumns(state, col) then state else Table.AddColumn(state, col, each null)  )in  WithAllColumns

If you want defaults (for example Currency = “USD” when missing), add the column with that default, or fill nulls after adding:

let  AddedCurrency = if Table.HasColumns(WithAllColumns, "Currency") then WithAllColumns else Table.AddColumn(WithAllColumns, "Currency", each "USD"),  CurrencyFilled = Table.ReplaceValue(AddedCurrency, null, "USD", Replacer.ReplaceValue, {"Currency"})in  CurrencyFilled

Step 5: Remove extra columns to lock the schema

Files often include extra columns like “Notes”, “Last Updated”, or “Region (old)”. If you allow them through, your final table becomes unstable. After renaming and adding missing columns, explicitly select only the target columns (plus any metadata columns you add).

let  Selected = Table.SelectColumns(CurrencyFilled, TargetColumns, MissingField.UseNull)in  Selected

MissingField.UseNull ensures that if a column is still missing for some reason, it will be created as null rather than erroring.

Step 6: Enforce data types carefully (and avoid type errors)

Type enforcement is necessary, but it’s also a common source of refresh failures when one file contains unexpected values (like “N/A” in a numeric column). A robust approach is:

  • Convert using locale-aware parsing if needed.

  • Replace known non-numeric placeholders with null before changing type.

  • Use try ... otherwise for conversions when data is messy.

Example: safely convert Quantity and UnitPrice to numbers:

let  CleanQty = Table.TransformColumns(Selected, {    {"Quantity", each try Number.From(_) otherwise null, type number},    {"UnitPrice", each try Number.From(_) otherwise null, type number}  }),  Typed = Table.TransformColumnTypes(CleanQty, {    {"Date", type date},    {"CustomerID", type text},    {"CustomerName", type text},    {"Product", type text},    {"Currency", type text}  })in  Typed

Note that setting a column’s type in the same step as a try-based transform helps keep the schema consistent even when values are invalid.

Step 7: Add SourceFile metadata for traceability

When combining many files, you need a way to trace bad rows back to their origin. Add a column like SourceFile (and optionally SourceFolder or ModifiedDate). This metadata usually comes from the file list query, not from inside the file itself.

When you apply your per-file transformation to each file, include the file name as an argument or add it afterward. Example pattern after you have a table of files with columns like [Name] and [Content]: create a custom column that returns the standardized table, then expand it, and keep [Name] as SourceFile.

In M, the idea looks like:

let  Added = Table.AddColumn(Files, "Data", each TransformFile([Content])),  Expanded = Table.ExpandTableColumn(Added, "Data", {"Date","CustomerID","CustomerName","Product","Quantity","UnitPrice","Currency"}),  WithSource = Table.RenameColumns(Expanded, {{"Name","SourceFile"}})in  WithSource

This makes troubleshooting dramatically easier: filter SourceFile to isolate the problematic report.

Handling common real-world variations

Variation: different sheet or table names

If some files use a different table name (e.g., “SalesData” instead of “Sales”), you can implement a small selection routine: try preferred names in order, otherwise error. Keep the list short and explicit so you don’t accidentally pick the wrong object.

let  Source = Excel.Workbook(FileContents, true),  Candidates = {"Sales","SalesData","Report"},  Pick = List.First(    List.RemoveNulls(      List.Transform(Candidates, (n) => try Source{[Item=n,Kind="Table"]}[Data] otherwise null)    )  ),  Data = if Pick = null then error "No expected table found" else Pickin  Data

Variation: columns present but with different casing or trailing spaces

Normalize column names early by trimming and standardizing case. You can transform the column names themselves:

let  NormalizedNames = Table.TransformColumnNames(Promoted, each Text.Upper(Text.Trim(_)))in  NormalizedNames

Then your mapping table should use the same normalization (e.g., uppercase incoming names) so renames match reliably.

Variation: one file includes a subtotal section

Subtotal rows often have blanks in key fields or contain labels like “Subtotal” in a product column. Filter them out using rules tied to required fields. For example, if Date and CustomerID must exist for a valid row:

let  Filtered = Table.SelectRows(Typed, each [Date] <> null and [CustomerID] <> null)in  Filtered

If subtotals still pass that test, add a second rule (e.g., Quantity not null, or Product not equal to “Subtotal”).

Variation: mixed decimal separators and locales

If some files use commas as decimal separators, Number.From may misinterpret values depending on your system locale. In those cases, parse using Number.FromText with a specified culture. Example for German-style numbers:

let  Parsed = Table.TransformColumns(Selected, {    {"UnitPrice", each try Number.FromText(Text.From(_), "de-DE") otherwise null, type number}  })in  Parsed

Apply this only when you know the source locale; otherwise, you can implement a conditional parse (for example, detect whether the text contains a comma and a dot pattern).

Practical workflow: build, test, then scale

Use a small test set of files

Before you combine hundreds of files, test your standardization logic on a handful that represent the range of variations: an “old format” file, a “new format” file, one with missing columns, and one with messy values. Your goal is to make the per-file transformation return the same column set and types for all of them.

Validate the schema after combining

After you expand and append all files, add quick checks:

  • Count rows per SourceFile to spot zero-row files.

  • Check for nulls in required fields (Date, CustomerID, etc.).

  • Check for unexpected columns (there should be none if you used explicit selection).

One simple technique is to create a separate “audit” query that groups by SourceFile and counts rows and nulls. This doesn’t change your main query; it gives you a monitoring view.

let  Audit = Table.Group(Combined, {"SourceFile"}, {    {"Rows", each Table.RowCount(_), Int64.Type},    {"NullCustomerID", each List.Count(List.Select([CustomerID], each _ = null)), Int64.Type}  })in  Audit

This kind of audit helps you detect when a new file format arrives and breaks assumptions, without manually inspecting the combined table.

Design choices that keep the combined table stable

Prefer “select then type” over “type then select”

If you change types before you lock the schema, you may end up typing columns that later get renamed or removed. A stable order is: rename columns, add missing columns, select target columns, then set types. This ensures the final table always has the same structure before type enforcement.

Keep transformation steps resilient to missing fields

Use options like MissingField.Ignore and MissingField.UseNull where appropriate. This prevents refresh failures when a column is absent in one file, while still enforcing your final schema by adding the column back as null.

Separate “extraction” from “standardization”

Extraction is the part that reads the file and finds the right sheet/table. Standardization is the part that renames, adds/removes columns, and types. Keeping these as distinct sections (or even separate queries/functions) makes maintenance easier: if the sheet name changes, you adjust extraction; if a new column appears, you adjust the schema logic.

Example: a complete per-file standardization function (template)

The following is a template you can adapt. It assumes you pass in the file binary and return a standardized table. Replace the extraction logic (table name, sheet name, CSV parsing) to match your files.

let TransformFile = (FileContents as binary) as table =>  let    Source = Excel.Workbook(FileContents, true),    Raw = Source{[Item="Sales",Kind="Table"]}[Data],    NonBlank = Table.SelectRows(Raw, each List.NonNullCount(Record.FieldValues(_)) > 0),    Promoted = Table.PromoteHeaders(NonBlank, [PromoteAllScalars=true]),    // Normalize column name text    NormNames = Table.TransformColumnNames(Promoted, each Text.Trim(_)),    // Rename using a mapping table query named ColumnMap with [IncomingName],[StandardName]    RenamePairs = List.Transform(Table.ToRows(ColumnMap), each { _{0}, _{1} }),    Renamed = Table.RenameColumns(NormNames, RenamePairs, MissingField.Ignore),    TargetColumns = {"Date","CustomerID","CustomerName","Product","Quantity","UnitPrice","Currency"},    WithAll = List.Accumulate(      TargetColumns,      Renamed,      (state, col) => if Table.HasColumns(state, col) then state else Table.AddColumn(state, col, each null)    ),    Selected = Table.SelectColumns(WithAll, TargetColumns, MissingField.UseNull),    CleanNumbers = Table.TransformColumns(Selected, {      {"Quantity", each try Number.From(_) otherwise null, type number},      {"UnitPrice", each try Number.From(_) otherwise null, type number}    }),    Typed = Table.TransformColumnTypes(CleanNumbers, {      {"Date", type date},      {"CustomerID", type text},      {"CustomerName", type text},      {"Product", type text},      {"Currency", type text}    })  in    Typedin TransformFile

Once you have this function, the combine process becomes straightforward: apply it to each file’s content, expand, and add SourceFile. The strength is that every file is forced through the same schema gate.

Practical checklist for a reliable combined table

  • Write down the target schema (columns + types) and enforce it explicitly.

  • Rename columns via a mapping table to handle synonyms and drift.

  • Add missing columns before appending; remove extra columns after renaming.

  • Use try ... otherwise for numeric/date conversions when files contain placeholders.

  • Add SourceFile metadata so you can trace issues quickly.

  • Create an audit query (row counts, null checks) to detect format changes early.

Now answer the exercise about the content:

When combining many similar reports into one table, what approach best keeps the final output stable even when individual files drift (missing columns, renamed headers, extra fields)?

You are right! Congratulations, now go to the next page

You missed! Try again.

A stable combined table comes from standardizing each file to a defined target schema (rename, add missing, remove extra, type safely) before appending, so downstream steps always see the same columns and types.

Next chapter

Structuring Data with Unpivot, Pivot, and Column Shaping

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.