Free Ebook cover Excel Power Query Playbook: Repeatable Data Prep Without VBA

Excel Power Query Playbook: Repeatable Data Prep Without VBA

New course

10 pages

Merging Queries and Appending Tables for Multi-Source Models

Capítulo 6

Estimated reading time: 14 minutes

+ Exercise

Why merge and append matter in multi-source models

In Power Query, most multi-source models boil down to two repeatable patterns: append when you want to stack tables with the same structure (more rows), and merge when you want to enrich a table by looking up related attributes from another table (more columns). These two operations are the backbone of building a model that can be refreshed without manual copy/paste, even when data comes from different systems.

Think of append as creating a single fact-like table from multiple feeds (for example, sales transactions from two ERPs). Think of merge as creating relationships at query time (for example, adding customer segment, region, or product category to those transactions). In many projects you will do both: append multiple transaction sources into one table, then merge in dimensions (customers, products, calendar, exchange rates) to produce a dataset ready for reporting.

Append queries: stacking tables into one

When to append

Append is appropriate when the tables represent the same type of entity and should live in one list._toggle examples: monthly extracts, multiple business units, multiple systems that produce the same transaction grain, or separate “current” and “history” tables.

  • Good fit: SalesOrders_US and SalesOrders_EU with the same columns (OrderID, OrderDate, CustomerID, Amount, Currency).
  • Not a good fit: SalesOrders and Customers (different entities). That is a merge or a relationship in the data model, not an append.

Append basics: two tables

To append two queries in Power Query:

  • In the Power Query Editor, select the query you want as the base (often the one you consider the “main” table).
  • Go to Home > Append Queries (or Append Queries as New if you want to keep the originals untouched and create a new combined query).
  • Choose Two tables, select the second table, and confirm.

Power Query will create a step that uses Table.Combine under the hood. If both tables have identical column names and types, the result is straightforward: rows are stacked.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

Append many tables: multiple sources

When you have more than two sources (for example, one query per region), use Append Queries with Three or more tables. This creates a combined table from a list of tables. The key operational advantage is that you can keep each source query focused on connecting and selecting the relevant columns, then centralize the stacking logic in one place.

In M, the pattern looks like this:

let    Combined = Table.Combine({Sales_US, Sales_EU, Sales_APAC})in    Combined

This is also a useful pattern when you want to programmatically build the list of tables (for example, based on parameters), but even without advanced automation, it clarifies intent: “these are all the same kind of data; stack them.”

Handling mismatched columns during append

Real-world appends often fail silently in the sense that they succeed but produce unexpected nulls. Power Query appends by column name. If one table has a column the other does not, the combined table will include the union of columns, and missing values will be null for the table that lacks that column.

Common causes and fixes:

  • Column name differences: One source uses CustomerId and another uses CustomerID. Fix by renaming columns in each source query before appending so the names match exactly.
  • Extra columns in one source: One system includes SalesRep but another doesn’t. Decide whether you want to keep it (accept nulls) or remove it from the richer source for consistency.
  • Different data types: One source has Amount as text, another as number. If you append first, you may end up with type conflicts later. Prefer to align types in each source query before appending, or apply a consistent type step immediately after append.

A practical approach is to define a “target schema” (the set of columns you want in the final table) and ensure each source query outputs exactly those columns in the same order and with the same names. Then append becomes predictable.

Adding a source identifier column before appending

When you append multiple sources, you often need to know where each row came from (for debugging, auditing, or reporting). Add a column like SourceSystem or BusinessUnit in each source query before appending. This is especially important when keys overlap across systems (for example, OrderID 1001 exists in both ERPs).

Example M snippet for a source query:

let    AddSource = Table.AddColumn(PreviousStep, "SourceSystem", each "ERP_A", type text)in    AddSource

Repeat for each source with a different label, then append. This creates a stable lineage marker that survives refreshes.

Merge queries: enriching a table with related data

What a merge does (and what it is not)

A merge in Power Query is a join operation: it matches rows between two tables using one or more key columns and returns a nested table column that you can expand to bring in additional fields. It is not the same as appending. It is also not the same as creating a relationship in the Excel Data Model, though the intent can be similar. The difference is that merge physically adds columns to the output table at refresh time.

Use merge when you need to:

  • Add descriptive attributes (CustomerName, Segment, Region) to a transaction table.
  • Map codes to labels (StatusCode to StatusText).
  • Bring in rates or factors (Currency to FXRate by date).
  • Match across systems using a crosswalk table (LegacyCustomerID to MasterCustomerID).

Join types and how to choose

Power Query offers several join kinds. Choosing the right one prevents missing data or accidental row loss:

  • Left Outer (all from first, matching from second): Most common for enrichment. Keeps all rows in the main table even if the lookup is missing.
  • Inner (only matching rows): Useful when you want to filter the main table to only valid keys (for example, only customers present in an approved list).
  • Right Outer: Less common; similar logic but keeps all rows from the second table.
  • Full Outer: Good for reconciliation to see mismatches on both sides.
  • Left Anti / Right Anti: Returns rows that do not match. Excellent for data quality checks (for example, transactions with CustomerID not found in Customers).

In a repeatable model, you often use Left Outer for the production dataset and keep separate “audit” queries using Left Anti to monitor missing keys.

Step-by-step: merge a fact table with a dimension table

Scenario: You have an appended transactions table Sales_All with columns: OrderID, OrderDate, CustomerID, Amount, SourceSystem. You also have a customer table Customers with CustomerID, CustomerName, Segment, Region.

Steps:

  • Select Sales_All in the Queries pane.
  • Go to Home > Merge Queries (or Merge Queries as New to create a new query).
  • In the merge dialog, choose Customers as the second table.
  • Click CustomerID in Sales_All and click CustomerID in Customers to define the join keys.
  • Choose Left Outer as the join kind.
  • Click OK. A new column appears (often named Customers) containing nested tables.
  • Click the expand icon on that column, select the fields to bring in (CustomerName, Segment, Region), and uncheck “Use original column name as prefix” if you want cleaner names.

After expansion, you have a single table with both transaction metrics and customer attributes. This is often the dataset you load to the worksheet or data model.

Understanding the nested table column

The nested table column is Power Query’s way of representing a one-to-many or many-to-many potential match. If your lookup table has duplicate keys, the nested table can contain multiple rows for a single transaction row. When you expand, you may accidentally duplicate transaction rows, inflating totals.

Practical rule: before merging, ensure the lookup table is unique on the join key(s) if you expect a one-to-one enrichment. If uniqueness is not guaranteed, decide how to resolve duplicates (for example, keep the latest record, keep a specific status, or aggregate).

Step-by-step: detect and prevent row duplication from non-unique keys

Scenario: Customers contains multiple rows per CustomerID (for example, multiple addresses). You only want one row per CustomerID for segmentation.

One repeatable approach is to create a dedicated lookup query Customers_Lookup that enforces uniqueness:

  • Reference the original Customers query (so you don’t duplicate connection logic).
  • Keep only the columns needed for the merge (CustomerID, Segment, Region, CustomerName).
  • Remove duplicates on CustomerID, or group by CustomerID and choose a rule for which row to keep.

Then merge Sales_All with Customers_Lookup instead of the raw Customers table.

In M, removing duplicates looks like:

let    KeepCols = Table.SelectColumns(Customers, {"CustomerID","CustomerName","Segment","Region"}),    Unique = Table.Distinct(KeepCols, {"CustomerID"})in    Unique

If “distinct” is not logically correct (because duplicates differ), use a grouping rule. For example, choose the max of a date column (not shown here) and then merge back, or use sorting plus removing duplicates so the “first” record is deterministic.

Merging on multiple keys (compound joins)

Many business lookups require more than one key. A common example is exchange rates: you need both Currency and Date to find the right rate. Another is pricing: ProductID plus EffectiveDate range (range joins require a different technique), or Plant plus Material.

Power Query supports selecting multiple columns in the merge dialog. The keys must align in order: if you select Currency then Date in the first table, select Currency then Date in the second table.

Example: merge Sales_All with FXRates on Currency and OrderDate (assuming FXRates has RateDate):

  • Select Currency and OrderDate in Sales_All.
  • Select Currency and RateDate in FXRates.
  • Join kind: Left Outer.
  • Expand the Rate column.

After expansion, you can compute a normalized amount (for example, AmountUSD) using the merged rate. Keep the rate column as well for auditability.

Choosing between merge vs relationships in the data model

In Excel solutions that load to the Data Model, you can either merge attributes into the fact table in Power Query or keep separate tables and relate them in the model. The choice affects refresh time, file size, and flexibility.

  • Prefer merge when: you need a single flat table for downstream consumers, you are exporting a prepared dataset, or you want to lock in a specific mapping at refresh time (for example, a crosswalk that must be applied consistently).
  • Prefer relationships when: dimensions are reused across multiple fact tables, you want to avoid repeating attributes in multiple tables, or you want more flexible slicing without widening the fact table.

Even if you plan to use relationships, merges are still valuable for data quality checks (for example, anti-joins to find missing keys) and for creating bridge/crosswalk outputs.

Practical pattern: append first, then merge

A common multi-source pattern is:

  • Build one query per source system that outputs the same standardized transaction schema.
  • Append those standardized queries into Fact_Transactions.
  • Build standardized dimension queries (Customers, Products, Calendar, FXRates), ideally with unique keys.
  • Merge dimensions into the fact only when needed (or load dimensions separately if using the Data Model).

This ordering matters. If you merge before appending, you repeat the same merge logic in multiple places and increase maintenance. By appending first, you apply enrichment once to the combined dataset.

Practical pattern: merge to a crosswalk for master data alignment

When multiple systems use different identifiers, you often need a crosswalk (mapping) table. Example: ERP_A uses CustomerID like A-100, ERP_B uses numeric IDs, and the business wants a MasterCustomerID.

Approach:

  • Ensure the appended fact table includes SourceSystem and the native CustomerID.
  • Create a crosswalk table with columns: SourceSystem, NativeCustomerID, MasterCustomerID.
  • Merge the fact table to the crosswalk on (SourceSystem, CustomerID) to bring in MasterCustomerID.
  • Then merge to a Master Customers dimension on MasterCustomerID to bring in attributes.

This two-step merge prevents collisions where the same native ID exists in multiple systems and makes the mapping explicit and auditable.

Audit queries with anti-joins (data quality that refreshes)

Anti-joins are one of the most practical uses of merge for repeatable data prep because they turn “missing lookup” problems into a refreshable report.

Step-by-step: find transactions with missing customers

  • Duplicate or reference Sales_All into a new query named Audit_MissingCustomers.
  • Merge Audit_MissingCustomers with Customers_Lookup on CustomerID.
  • Choose join kind Left Anti (rows only in first table).
  • Result: a table of transactions whose CustomerID does not exist in the lookup.

Load this audit table to a worksheet (or keep as connection only) so stakeholders can fix master data issues. Because it is query-driven, it updates on refresh and becomes part of your operational process.

Step-by-step: reconcile two systems with full outer join

Scenario: You want to compare totals or record presence between System A and System B for the same business key (for example, InvoiceNumber).

  • Create two standardized tables: Invoices_A and Invoices_B.
  • Merge them using Full Outer on InvoiceNumber.
  • Expand key fields from both sides (or keep nested tables and add indicators).
  • Add a custom column that flags: “Only in A”, “Only in B”, “In both”.

This produces a reconciliation dataset that can be filtered and reviewed without manual VLOOKUP work.

Performance and reliability tips for merge and append

Reduce columns before merging

Merges are faster and more stable when the lookup table contains only the columns you need. Create a slim lookup query that selects only key + required attributes. This reduces memory and speeds up matching.

Ensure consistent key formats

Even if you have already standardized types in earlier steps, multi-source models often introduce subtle key mismatches: leading zeros, trailing spaces, mixed case, or numeric vs text keys. If one table stores CustomerID as text “00123” and another as number 123, they will not match. Make the key format explicit in both tables before merging (for example, both as text with the same padding rule) and keep that logic close to the source queries so it remains stable.

Be deliberate about null keys

If the key column contains nulls, merges will not match those rows. Decide whether null keys should be filtered out into an audit query, replaced with a placeholder, or retained as “unknown.” For production facts, it is often better to keep the row (Left Outer) but also generate an audit table of null-key records for remediation.

Prefer “Merge as New” for reusable building blocks

When you merge directly inside a query, you bake enrichment into that query. When you use “Merge as New,” you can keep the original queries as reusable building blocks and create a separate enriched output query. This is helpful when multiple outputs need the same base table but different enrichment rules.

Watch for query folding boundaries

When sources are databases, merges and appends may be pushed back to the source (query folding) or executed locally depending on transformations. If performance becomes an issue, test whether the merge is folding by checking the applied steps for “View Native Query” availability. If folding breaks, consider moving certain transformations after the merge, or staging queries so the heavy join happens in the database when possible.

Common multi-source modeling examples

Example 1: Append regional sales, then merge product categories

Regional sales tables are appended into Sales_All. A separate Products table contains ProductID and Category. Merge Sales_All with Products on ProductID using Left Outer, then expand Category. Add an audit query using Left Anti to find ProductIDs in sales that do not exist in Products.

Example 2: Append current and historical, then merge calendar attributes

Append Transactions_Current and Transactions_History. Merge the combined table to a Calendar table on Date to bring in Month, Quarter, FiscalYear. This avoids recalculating date attributes row-by-row with custom logic and keeps date definitions centralized.

Example 3: Merge to a mapping table to standardize statuses

Different systems may use different status codes. Create a mapping table with columns: SourceSystem, StatusCode, StandardStatus. Merge on (SourceSystem, StatusCode) to bring in StandardStatus. This allows consistent reporting across systems without rewriting logic in formulas.

Now answer the exercise about the content:

In a multi-source Power Query model, what is the most maintainable way to combine and enrich transactions from multiple systems?

You are right! Congratulations, now go to the next page

You missed! Try again.

Append stacks same-structure transaction tables into one fact table, then a single merge can add related attributes (more columns). Doing append first avoids repeating the same merge logic across multiple source queries and reduces maintenance.

Next chapter

Using Parameters and Dynamic Inputs for Reusable Queries

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.