Free Ebook cover Excel Power Query Playbook: Repeatable Data Prep Without VBA

Excel Power Query Playbook: Repeatable Data Prep Without VBA

New course

10 pages

Refresh Strategies, Privacy Levels, and Performance-Safe Design

Capítulo 9

Estimated reading time: 13 minutes

+ Exercise

Why refresh strategy matters in Power Query

In a repeatable Power Query solution, “refresh” is not a single action—it is a design constraint. Your queries must refresh reliably (no prompts, no broken credentials), predictably (same results given the same inputs), and efficiently (fast enough for daily use). This chapter focuses on three areas that most often determine whether a Power Query model scales: refresh strategies (how and when data updates), privacy levels (how Power Query isolates sources), and performance-safe design (how to keep transformations fast and stable).

These topics are tightly connected. A refresh strategy that pulls from multiple sources can trigger privacy “firewalls.” Privacy settings can disable query folding and force Power Query to process more locally, slowing refresh. Performance-safe design often means shaping queries so folding is preserved and expensive steps are minimized.

Refresh strategies: choosing how data updates

Strategy 1: Manual refresh with predictable inputs

Manual refresh is appropriate when the workbook is used by a single analyst or when the data sources are not always available. The goal is to make manual refresh boring: no dialogs, no credential prompts, no “this step references other queries” surprises.

  • Use stable file paths and stable connection strings.
  • Ensure credentials are stored and tested on the machine(s) that will refresh.
  • Keep query dependencies clear: staging queries load disabled, final queries load enabled.

Strategy 2: Scheduled refresh (workbook as a repeatable artifact)

Scheduled refresh is common when the workbook is used as a reporting artifact refreshed daily/weekly. Even if you are not using a server-based scheduler, you can design as if refresh will be unattended. That means: no interactive prompts, no “choose a file” steps, and no reliance on local-only resources unless you control the refresh environment.

  • Prefer sources that are reachable from the refresh environment (network share, SharePoint/OneDrive, database).
  • Avoid dependencies on user-specific paths (like C:\Users\Name\Downloads).
  • Keep refresh time bounded by reducing row/column volume early and preserving folding where possible.

Strategy 3: Incremental-like patterns in Excel (when full refresh is too slow)

Excel Power Query does not provide the same incremental refresh feature set as Power BI service, but you can still use incremental-like patterns when full refresh becomes expensive. The idea is to reduce how much data is pulled and processed on each refresh.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

Common patterns include:

  • Date-window filtering at the source: Only import the last N days/months. This works best when the source supports folding (SQL, some OData endpoints). You filter on a date column as early as possible so the source does the work.

  • Partitioned queries: Build separate queries for “historical” (rarely changes) and “recent” (changes often), then append them. Refresh “recent” frequently and “historical” rarely. In Excel, you can control this by disabling refresh for the historical query in connection properties and refreshing it manually when needed.

  • Snapshot tables: Keep a loaded table of prior results and append new results. This is more complex and can be fragile because Power Query is designed for deterministic refresh, not stateful accumulation. Use only when you can guarantee a reliable key and a clear snapshot policy.

Controlling refresh behavior in Excel

Workbook connection properties that matter

In Excel, each loaded query creates a connection. You can set refresh behavior via Data > Queries & Connections > right-click a query > Properties (or via the connection properties dialog). Key settings include:

  • Refresh data when opening the file: Useful for reports, risky for heavy models. Enable only when refresh time is acceptable and sources are always reachable.
  • Refresh every N minutes: Rarely appropriate for Power Query models unless the data source is lightweight and stable. Frequent refresh can lock the workbook and frustrate users.
  • Enable background refresh: Can keep Excel responsive, but may cause confusion if users interact with partially refreshed outputs. For critical outputs, consider disabling background refresh so refresh is atomic from the user’s perspective.
  • Refresh this connection on Refresh All: Use this to exclude expensive or rarely needed queries from routine refresh.

Step-by-step: set up a “fast daily refresh” vs “full refresh” mode

This pattern helps when you have one expensive query (for example, historical transactions) and one small query (recent transactions) that must update daily.

Goal: Daily users click Refresh All and get updated results quickly. Occasionally, an analyst runs a full rebuild.

Steps:

  • 1) Split the model into two final queries: one for Historical, one for Recent. Both should output the same schema (same column names and types).
  • 2) Create a third query that appends them: Final = Table.Combine({Historical, Recent}). Load only Final (and optionally load Recent for auditing).
  • 3) Load settings: Disable load for Historical and Recent staging queries if they are only intermediates. Load Final to a table or the Data Model.
  • 4) Connection refresh settings: In connection properties, uncheck “Refresh this connection on Refresh All” for Historical. Keep it checked for Recent and Final.
  • 5) Full refresh procedure: When needed, manually refresh Historical first, then Refresh All. Document this as an operational step for the workbook owner.

This approach is simple, transparent, and avoids complicated stateful logic.

Privacy levels: what they are and why they affect performance

The core concept: data isolation and the “formula firewall”

Power Query’s privacy levels exist to prevent unintended data leakage between sources. If you combine data from multiple sources, Power Query may restrict how it evaluates queries to ensure that data from a “private” source is not sent to a “public” source (or vice versa). This is enforced through a mechanism often referred to as the formula firewall.

Practically, privacy levels can change the evaluation plan. In some cases, Power Query stops folding operations back to the source and instead pulls more data locally to combine it safely. That can dramatically slow refresh.

Common privacy levels and typical use

  • Public: Data that can be shared freely. Example: a public web dataset.
  • Organizational: Data safe within your organization. Example: internal databases, SharePoint sites.
  • Private: Sensitive data that should not be combined with other sources in a way that could leak it. Example: personal files, HR data.

The key is consistency. If you combine sources with different privacy levels, Power Query may isolate them and block folding or force local evaluation.

Step-by-step: set privacy levels intentionally for a workbook

Goal: Avoid accidental slowdowns and unexpected firewall errors by explicitly setting privacy levels for each source.

  • 1) Open Data Source Settings: In Power Query Editor, go to File > Options and settings > Data source settings.
  • 2) Review each source: Select a source and choose Edit Permissions.
  • 3) Set the privacy level: Choose Public, Organizational, or Private based on your governance rules.
  • 4) Confirm credential method: While you are there, confirm the authentication method is correct and stable (Windows, Basic, OAuth, etc.).
  • 5) Re-test refresh: Close and load, then run Refresh All to confirm no prompts and acceptable performance.

If your organization has policies, follow them. If you are building a workbook for broad distribution, align sources to the same privacy level when appropriate and permitted (often Organizational for internal systems).

When privacy levels cause “it used to fold, now it doesn’t”

A common symptom is a query that was fast suddenly becoming slow after adding a second source (for example, merging a SQL table with an Excel mapping file). Even if the merge is logically simple, privacy isolation can prevent the SQL source from receiving a filtered query, forcing Power Query to download more rows and do the merge locally.

Performance-safe response is not “turn off privacy” by default. Instead, first try to design the model so that cross-source combination happens after each source has been reduced to the minimum necessary shape and size.

Performance-safe design: keep refresh fast and stable

Principle 1: Reduce rows and columns as early as possible

Every extra column carried through steps increases memory and CPU cost. Every extra row increases join and grouping cost. The safest performance habit is to remove what you do not need immediately after the source step.

  • Keep only required columns early (remove wide text columns if not needed).
  • Filter early (especially on date ranges or status flags).
  • Apply data types early, but be mindful: type conversion on huge datasets can be expensive if done repeatedly.

Principle 2: Preserve query folding whenever possible

Query folding means Power Query translates your steps into a source-native query (often SQL) so the source does the heavy lifting. Folding is one of the biggest determinants of refresh time for database-backed models.

Practical implications:

  • Prefer transformations that fold (filters, selecting columns, simple merges) before transformations that often break folding (custom functions over rows, complex text parsing, adding index columns in some contexts).
  • Do expensive operations after you have filtered down to the smallest dataset.
  • When combining sources, expect folding to break; plan to reduce each source first.

In Power Query Editor, you can often check folding by right-clicking a step and looking for “View Native Query” (availability depends on connector and step). If it disappears after a certain step, that step likely broke folding.

Principle 3: Avoid row-by-row custom logic on large tables

Custom columns that call complex logic per row (especially with text parsing, web calls, or lookups into other tables) can be a refresh killer. Prefer set-based operations:

  • Use merges (joins) instead of per-row lookups.
  • Use grouping and aggregation instead of iterative calculations.
  • Precompute reference tables (small dimensions) and merge once.

Principle 4: Buffering—use sparingly and only with a reason

Table.Buffer can improve performance in specific cases by forcing Power Query to materialize a table in memory, preventing repeated evaluation. It can also make performance worse by disabling folding and increasing memory usage.

Use buffering when you have evidence of repeated evaluation (for example, a query that references the same expensive step multiple times) or when a non-folding source is being re-read. Avoid buffering large tables from foldable sources like SQL unless you are certain it helps.

Example pattern where buffering can help (small dimension used repeatedly):

let    Dim = Table.Buffer(DimCustomer),    Merged1 = Table.NestedJoin(Fact, {"CustomerID"}, Dim, {"CustomerID"}, "Dim", JoinKind.LeftOuter),    Expanded = Table.ExpandTableColumn(Merged1, "Dim", {"Segment"}, {"Segment"})in    Expanded

Here, buffering a small dimension can prevent multiple reads if the dimension is referenced in several downstream steps or queries.

Principle 5: Design joins to be cheap

Merges are often the most expensive step. Make them cheaper by:

  • Ensuring join keys have consistent data types on both sides (text vs number mismatches can cause slow comparisons and unexpected non-matches).
  • Reducing the right-side table to only the key and needed attributes before merging.
  • Removing duplicates on the right-side key when you expect a one-to-many relationship (or at least validating uniqueness). Duplicate keys can multiply rows and explode memory usage.

Step-by-step: “merge safely” checklist:

  • 1) On both tables: set the key column data type explicitly.
  • 2) On the lookup table: keep only key + required columns.
  • 3) If appropriate: remove duplicates on the key.
  • 4) Merge: use Left Outer for enrichment, Inner only when you intentionally want to filter to matches.
  • 5) Expand: expand only the needed columns.

Principle 6: Be careful with sorting and distinct on large datasets

Sorting and removing duplicates can force full scans and large memory operations. If you only need “latest record per key,” consider filtering at the source (folding) or grouping with an aggregation that selects the max date, then merging back to get the row details. If the source is SQL, a view or native query may be more appropriate for heavy deduplication logic.

Refresh reliability: credentials, prompts, and deterministic behavior

Credential stability and avoiding refresh prompts

Unattended refresh fails when Power Query cannot authenticate or when a connector prompts for user input. Practical steps:

  • Use Data source settings to confirm each source has a stored credential.
  • Prefer organizational authentication methods that do not expire unexpectedly for shared workbooks (where possible).
  • Avoid connectors that require interactive approval each refresh unless you control the refresh environment.

Deterministic outputs: avoid “it changes every refresh” logic

Some transformations can produce different results on each refresh even if the source data is unchanged, which makes auditing difficult.

  • Avoid using current timestamp logic unless you truly need it (for example, adding a “RefreshedAt” column). If you do add it, keep it in a separate query or clearly label it as metadata.
  • Be cautious with operations that depend on row order (like adding an index and then using it as a key) unless the ordering is explicitly defined and stable.

Designing for multi-source models without performance surprises

Pattern: stage each source, then combine

When you must combine multiple sources (for example, SQL facts + Excel mapping + SharePoint reference list), a performance-safe pattern is:

  • Stage queries per source: Each source is cleaned and reduced independently (filter rows, select columns, set types).
  • Combine late: Merge/append only after each staged table is as small as possible.
  • Keep staging queries not loaded: They exist for structure and debugging, not as outputs.

This reduces the risk that privacy isolation forces Power Query to pull entire datasets locally.

Pattern: isolate non-folding work into small tables

If you have a step that breaks folding (complex text parsing, custom function, fuzzy matching), try to apply it to a smaller table:

  • Filter and select columns first (folding).
  • Aggregate or deduplicate to reduce row count.
  • Then apply the non-folding transformation.

Example: Instead of parsing a long description field for 5 million rows, first filter to the relevant product category and keep only the needed columns, then parse.

Monitoring and troubleshooting refresh performance

Use query diagnostics to find bottlenecks

Power Query includes diagnostic tools that help identify slow steps. In Power Query Editor, you can use Tools > Diagnose (options vary by Excel version) to capture step timings and evaluate which operations are expensive.

Practical workflow:

  • Run diagnostics on a refresh.
  • Identify the step where time spikes.
  • Check whether folding was lost at or before that step.
  • Refactor: move filters earlier, reduce columns earlier, or redesign merges.

Validate folding loss with “View Native Query” where available

If “View Native Query” is available on early steps but disappears later, the first step where it disappears is a strong candidate for refactoring. Sometimes you can reorder steps to keep folding longer (for example, filter before adding a custom column).

Stress-test with realistic data volume

Many queries look fine on a sample but fail in production volume. Before distributing a workbook:

  • Test refresh on a full-size dataset (or a close approximation).
  • Measure refresh time and memory behavior.
  • Test on the least powerful machine expected to refresh the file.
  • Test with the same credentials and network conditions as the intended refresh environment.

Operational playbook: a practical refresh checklist

Before you hand off a workbook

  • All sources have explicit privacy levels set according to policy.
  • All sources have stored credentials; Refresh All runs without prompts.
  • Staging queries are not loaded; only final outputs are loaded.
  • Expensive queries are excluded from Refresh All if they are not needed daily.
  • Filters and column selection happen early; merges expand only needed fields.
  • Folding is preserved as long as possible for database sources.
  • Refresh time is measured and documented (expected duration, known bottlenecks).

When refresh becomes slow after a change

  • Check whether a new source was added (privacy interaction).
  • Find the step where folding stops.
  • Look for new merges, sorts, distincts, or row-by-row custom columns.
  • Confirm key column types match for joins.
  • Consider splitting historical vs recent and excluding historical from Refresh All.

Now answer the exercise about the content:

Which design choice best supports a fast daily refresh while still allowing an occasional full rebuild in an Excel Power Query workbook?

You are right! Congratulations, now go to the next page

You missed! Try again.

A fast daily refresh can be achieved by refreshing only the small Recent query and the appended Final output, while excluding the expensive Historical query from Refresh All and refreshing it manually for a full rebuild.

Next chapter

Troubleshooting Refresh Errors and Data Type Mismatches

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.