All courses > Technology and Programming > Office Productivity (Excel, Word and More) ::

Importing Data from Folders and Automating File Intake

Capítulo 2

Estimated reading time: 13 minutes

Why folder-based imports matter

When data arrives as multiple files over time (daily exports, weekly reports, monthly statements), importing “from a folder” is one of the most repeatable patterns in Power Query. Instead of rebuilding an import each time a new file arrives, you point Power Query at a folder and let it ingest every file that matches your rules. As new files are dropped into that folder, a refresh pulls them in automatically.

Folder imports are especially useful when:

You receive the same report split into multiple files (one per store, region, or day).
You get incremental deliveries (new files each week) and want a single consolidated table.
You want to standardize intake from a shared location (SharePoint/OneDrive synced folder, network share, or local directory).
You need to keep a historical archive of raw files while still producing a clean, current dataset.

How Power Query’s folder connector works

The Folder connector returns a table of files and metadata, then you choose which file contents to combine. Under the hood, the process typically has three layers:

File listing layer: A query that reads the folder and returns one row per file (name, extension, dates, folder path, and a binary “Content” column).
Transform sample file layer: A query that defines how to transform a single representative file (the “sample”).
Combine layer: A query that applies the sample transformation to every file and appends the results into one table.

This architecture is powerful because you only define the transformation once (on the sample), and Power Query applies it to all files. Your main job is to make sure the transformation is robust: it should tolerate small variations and avoid hard-coding file-specific details.

Prerequisites for a smooth folder intake

Folder ingestion works best when the incoming files follow a predictable structure. Before building the query, confirm these basics:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Consistent layout: Same column headers, same sheet/table name (for Excel), same delimiter and column order (for CSV), similar data types.
Stable header row: Headers should be in the same row across files. If some files have extra title rows, you’ll need logic to skip them.
File naming convention: Helpful for filtering and for extracting metadata (e.g., “Sales_2026-01-05_RegionEast.csv”).
Archive strategy: Decide whether you keep all historical files in the same folder or move processed files to an archive folder.

Step-by-step: Combine CSV files from a folder

1) Create a landing folder

Create a folder such as C:\Data\Intake\Sales. Drop a few representative CSV files into it. Keep at least two files so you can confirm the append behavior.

2) Connect to the folder

In Excel: Data > Get Data > From File > From Folder. Choose the folder and click Transform Data (not “Combine” yet if you want to inspect first; either path works).

You’ll see a table with columns like Name, Extension, Date accessed, Date modified, Folder Path, and Content (binary).

3) Filter to only the files you want

Folder queries often pick up unwanted files: temporary files, old formats, or unrelated exports. Apply filters early:

Filter Extension to “.csv”.
Filter out hidden/system files if present (some connectors expose an “Attributes” record; you can filter Attributes[Hidden] = false).
Optionally filter by Name (e.g., starts with “Sales_”).

Filtering early reduces refresh time and avoids transformation errors from unexpected files.

4) Combine files

Click the Combine button (often shown as two arrows) in the Content column header, or use Home > Combine > Combine Files. Power Query will prompt you to choose how to interpret the files (CSV delimiter, encoding, etc.).

Power Query will generate several helper queries. The main output query is typically named something like “Sales (2)” or “Combine Files”. Rename it to something meaningful, such as factSales_RawCombined.

5) Validate the “Transform Sample File” logic

Open the generated query usually named Transform Sample File. This is where the CSV parsing and header promotion happen. Confirm:

The delimiter is correct (comma, semicolon, tab).
Headers are promoted exactly once.
Data types are not overly aggressive (for example, IDs should often remain text).

Common issue: Power Query may auto-detect types differently depending on the sample file. If your sample has a blank value in a numeric column, it may infer text. Consider explicitly setting types later in the main query after combining, or use a stable type mapping step.

6) Add file-level metadata to each row

One of the biggest advantages of folder imports is that you can carry file metadata into the combined dataset. This helps with auditing, incremental logic, and troubleshooting.

In the main combined query, you can keep columns like Name and Date modified from the folder listing and pass them through the combine process. If the default combine removed them, you can re-merge by adjusting the combine step or by adding metadata inside the custom function call.

A practical approach is to extract a date from the file name and store it as a column:

// Example: file name like Sales_2026-01-05_RegionEast.csv
// Create a column IntakeDate from Name
= Table.AddColumn(#"Previous Step", "IntakeDate", each Date.FromText(Text.BetweenDelimiters([Name], "Sales_", "_")), type date)

If your naming pattern differs, adjust the parsing logic (Text.AfterDelimiter, Text.BeforeDelimiter, Text.Middle, Splitter.SplitTextByDelimiter, etc.).

Step-by-step: Combine Excel workbooks from a folder

Excel files add an extra layer: each workbook can contain multiple sheets and tables. The combine process must consistently pick the right object (e.g., a table named “tblSales” or a sheet named “Data”).

1) Standardize the source workbooks

Before you build the query, ensure each workbook has:

A consistent table name (recommended) such as tblSales, or
A consistent sheet name such as Sales with a consistent header row.

Using an Excel Table is usually more robust than relying on a sheet range, because the table carries headers and expands automatically.

2) Connect to the folder and filter

Data > Get Data > From File > From Folder. Filter Extension to “.xlsx” (and “.xlsm” if needed). Exclude temporary files such as those starting with “~$”.

3) Combine and select the correct object

Use Combine Files. When prompted, Power Query will show a navigator-like preview of the sample workbook’s objects. Choose the table or sheet that represents the dataset you want.

Open the generated Transform Sample File query and confirm it selects the correct object. If it’s selecting by position (e.g., “Item=Sheet1”), change it to select by name (e.g., “Item=tblSales”) so it doesn’t break if the workbook’s internal order changes.

Example pattern inside the sample transform (conceptual):

// Get workbook contents
Source = Excel.Workbook(Parameter1, null, true),
// Select the table by name
tbl = Source{[Item="tblSales",Kind="Table"]}[Data]

This approach is resilient because it targets the named table explicitly.

4) Handle extra header/title rows

Some exports include a title row above the headers, or a few blank rows. If that happens, you may need to remove top rows before promoting headers. Do this in the sample transform so it applies to all files:

= Table.PromoteHeaders(Table.Skip(#"Previous Step", 2), [PromoteAllScalars=true])

Adjust the number of skipped rows based on the actual layout. If the number of title rows varies, you may need a rule-based skip (for example, find the row that contains a known header label). That’s more advanced, but it’s often worth it when dealing with inconsistent exports.

Designing a robust file intake pipeline

Use a “landing zone” and an “archive”

A practical pattern is to separate where files arrive from where they are stored long-term:

Landing: Only files that should be picked up on the next refresh.
Archive: All historical files, optionally organized by year/month.

Power Query itself does not move files. If you want files to automatically move from Landing to Archive after refresh, you typically use an external mechanism (Windows scheduled task + script, Power Automate, or a manual process). Even without automatic movement, you can still keep everything in one folder and filter by Date modified or by a naming convention.

Filter out partial and temporary files

Common sources of refresh failures are incomplete files (still being written) or temporary lock files. Defensive filters help:

Exclude names starting with “~$” (Excel temp files).
Exclude files with size = 0.
Exclude files modified in the last N minutes (to avoid picking up a file mid-write). This is easiest if you control the delivery process; otherwise, consider a “drop complete” convention (e.g., deliver as .tmp then rename to .csv when complete).

Example filter to exclude temp files:

= Table.SelectRows(#"Previous Step", each not Text.StartsWith([Name], "~$"))

Normalize columns across files

Even “consistent” exports drift: a new column appears, a column is renamed, or order changes. If you simply append tables, Power Query will align by column name, but missing columns become nulls. That may be acceptable, but you should decide intentionally.

Two practical strategies:

Strict schema: Keep only a defined set of columns and ignore extras. This stabilizes downstream models.
Flexible schema: Allow new columns to flow through, but monitor changes and update downstream logic accordingly.

For a strict schema, select columns explicitly after combining:

= Table.SelectColumns(#"Previous Step", {"OrderID","OrderDate","Customer","Amount","Region"})

If a required column might be missing in some files, use a safer approach: add missing columns first, then select:

// Ensure required columns exist
Required = {"OrderID","OrderDate","Customer","Amount","Region"},
WithMissing = List.Accumulate(Required, #"Previous Step", (state, col) => if Table.HasColumns(state, col) then state else Table.AddColumn(state, col, each null)),
Result = Table.SelectColumns(WithMissing, Required)

Control data types after append

Type detection can vary across files, especially for columns with blanks or mixed formats. A stable approach is:

Combine first.
Clean and standardize values.
Apply data types once at the end.

This reduces the chance that one odd file forces a column into the wrong type.

Automating intake: making refresh pick up new files reliably

Refresh behavior and what “automatic” really means

In Excel, folder-based intake becomes automatic in the sense that the query always reads the folder contents at refresh time. When you click Refresh (or use Refresh All), any new files in the folder that match your filters will be included.

To make this hands-off, you can combine Power Query with:

Workbook refresh on open: Set queries to refresh when the file opens (Query Properties).
Scheduled refresh: If the workbook is hosted in an environment that supports scheduled refresh, configure it there. (Capabilities depend on your organization’s setup.)
Operational discipline: A consistent process for dropping files into the landing folder before the refresh time.

Parameterize the folder path

Hard-coding a folder path makes the solution brittle when moved to another machine or when the folder location changes. A common approach is to store the folder path in a named cell in Excel and read it as a parameter.

Example workflow:

Create a cell in an Excel sheet named pFolderPath containing the folder path text.
Load that cell into Power Query as a small table.
Convert it to a single value and use it in Folder.Files().

Conceptual M pattern:

// Read parameter table (one cell) and extract value
p = Excel.CurrentWorkbook(){[Name="pFolderPath"]}[Content]{0}[Column1],
Source = Folder.Files(p)

This makes it easy to repoint the intake without editing the query steps.

Incremental-style filtering in Excel scenarios

Even without a formal incremental refresh feature, you can reduce processing by filtering the folder listing before combining. For example, only include files modified after a certain date stored in a parameter cell (e.g., last successful load date). This can speed up refresh when the archive grows large.

Example concept:

Cutoff = DateTime.From(Excel.CurrentWorkbook(){[Name="pCutoff"]}[Content]{0}[Column1]),
Filtered = Table.SelectRows(Source, each [Date modified] >= Cutoff)

Be careful: if a file is corrected and re-sent with the same name but an older modified date (or if timestamps are unreliable on a network share), you might miss it. In those cases, filter by name pattern (e.g., include the last 60 days based on date in filename) rather than modified date.

Common pitfalls and how to avoid them

1) One “bad” file breaks the whole refresh

If one file has a different structure (extra columns, missing headers, corrupted content), the combine step can error out. Mitigations:

Filter more strictly (by name pattern, extension, or subfolder).
Keep a quarantine folder for problematic files.
Use error handling patterns in the custom function (advanced) to return an empty table for files that fail, while logging the file name for review.

2) Hidden files and system artifacts

Some environments add hidden files (thumbs.db, desktop.ini) or temporary files. Always filter by extension and exclude known patterns.

3) Inconsistent headers across files

If a column is renamed in one file (e.g., “CustName” vs “Customer”), you’ll end up with two columns after append. Decide on a canonical name and standardize it in the sample transform (or in the combined query) using Rename Columns.

4) Mixed date and number formats

CSV exports can contain dates in different formats depending on regional settings. If one file uses “01/02/2026” and another uses “2026-02-01”, parsing can become inconsistent. Prefer:

Keeping raw columns as text initially.
Parsing with explicit locale when converting types (where available).
Using Date.FromText with a known format only when you control the export format.

5) Performance issues as the folder grows

As you accumulate hundreds or thousands of files, refresh can slow down. Practical tactics:

Archive older files into subfolders and point the query only at the active period, or use a separate query for history.
Filter aggressively before combining.
Keep transformations efficient: remove unnecessary columns early, avoid row-by-row custom functions when a built-in transformation exists.

Practical example: building a repeatable “monthly intake”

Scenario: Each month, you receive multiple CSV files, one per branch, named like AR_2026-01_Branch001.csv. You want a single table with all branches and an added Month column.

Step-by-step outline

Create folder: C:\Data\Intake\AR
Drop files for at least two branches and two months.
Connect: From Folder, filter Extension = .csv, filter Name starts with “AR_”.
Combine files.
Add Month from file name using Text.BetweenDelimiters to capture “2026-01”.
Add Branch from file name using Text.AfterDelimiter to capture “Branch001” (strip extension if needed).
Select and type columns after append.

Example parsing logic:

// Month like 2026-01 between AR_ and _Branch
= Table.AddColumn(#"Previous Step", "Month", each Text.BetweenDelimiters([Name], "AR_", "_Branch"), type text)

// Branch like Branch001 between _ and .csv
= Table.AddColumn(#"Previous Step", "Branch", each Text.BetweenDelimiters([Name], "_", ".csv"), type text)

With these two columns, you can validate completeness (did every branch deliver a file for the month?) and you can build pivot tables or models that slice by Month and Branch.

Checklist for a dependable folder intake query

Folder path is parameterized (or at least easy to change).
Early filters: extension, name pattern, exclude temp/hidden files.
Sample transform selects the correct object (for Excel) by name, not position.
Headers are promoted correctly and only once.
Schema strategy is explicit (strict vs flexible).
File metadata (name, modified date, derived period) is captured for auditing.
Data types are applied after combining, not inconsistently per file.
Performance is considered (filter before combine, archive strategy).

Now answer the exercise about the content:

In a folder-based Power Query import, what is the key benefit of defining transformations in the Transform Sample File query?

You are right! Congratulations, now go to the next page

You missed! Try again.

The sample transform defines the parsing and cleanup steps for one representative file, and the combine process applies those same steps to all files, creating a repeatable intake.