What “Connecting to a Data Source” Means in Power BI
In Power BI Desktop, connecting to a data source is the process of selecting a connector (Excel, Text/CSV, Folder, SQL Server, etc.), choosing what to bring in, and deciding how Power BI will access it: either by importing a copy of the data into the model or by querying it on demand (DirectQuery) for certain sources. The connector you choose affects how reliably you can refresh, how you scale ingestion, and what modeling features are available.
Key decisions you make during connection
- Connector choice: pick the connector that matches the source and your refresh/scaling needs (e.g., Folder for many files).
- Data selection: select sheets/tables/files and confirm the preview matches expectations.
- Access mode: Import vs. DirectQuery (when available) changes refresh behavior and modeling flexibility.
- Data source documentation: record where the data came from and how it’s expected to refresh so the report remains maintainable.
Exercise 1: Connect to an Excel Workbook (Beginner-Friendly and Common)
Excel is one of the most common starting points. Power BI can read tables, named ranges, and worksheets. A best practice is to convert ranges to Excel Tables before connecting, because tables are more stable than arbitrary cell ranges.
Step-by-step: Connect and preview
- In Power BI Desktop, choose Get data > Excel.
- Select the workbook file and open it.
- In the Navigator, select the relevant Table or Sheet. Prefer tables (they usually have a table icon).
- Use the preview pane to confirm: column names look correct, data types appear reasonable, and there are no unexpected blank rows.
- Choose Transform Data if you need to adjust headers, data types, or remove extra rows; choose Load if it’s already clean enough to model.
Practical checks in the preview
- Headers: Are the first row values actually headers (e.g., “Date”, “Amount”), or is row 1 data?
- Multiple header rows: Some exports have title lines above the header row; those should be removed in Power Query.
- Data types: Dates should be Date/DateTime, amounts should be Decimal Number, IDs often should be Text (to preserve leading zeros).
When Excel is a good fit
- Small to medium datasets that fit comfortably in memory.
- Stable structure (same columns each refresh).
- Teams that maintain data in spreadsheets but can follow a consistent template.
Exercise 2: Connect to a CSV File (Headers, Delimiters, and Encoding)
CSV is simple and widely used, but it’s also where formatting issues show up: wrong delimiter, incorrect encoding, and inconsistent headers. Power BI’s Text/CSV connector lets you preview and adjust these settings before loading.
Step-by-step: Connect and validate delimiter/header handling
- Choose Get data > Text/CSV.
- Select the .csv file.
- In the preview dialog, verify the Delimiter (commonly comma, semicolon, tab). If columns appear merged into one, the delimiter is likely wrong.
- Check Data type detection. If it mis-detects (e.g., product codes become numbers), plan to set types explicitly in Power Query.
- Select Transform Data to correct issues before loading.
Common CSV problems and what to do
- Wrong delimiter: If you see one wide column, change delimiter (comma vs semicolon is a frequent regional difference).
- Headers not recognized: If the first row is data, you’ll need to promote headers; if there are extra title rows, remove them first.
- Encoding issues: If accented characters look wrong, adjust file encoding in the connector/Power Query settings (UTF-8 is common).
- Quoted text with commas: Proper CSVs use quotes around text fields; if parsing looks broken, inspect the raw file for inconsistent quoting.
Hands-on: Header and delimiter fixes in Power Query
In Power Query, you typically address these with a small set of actions. The exact UI clicks may vary, but the concepts are consistent:
- Promote headers: Use when row 1 contains column names.
- Remove top rows: Use when the file includes report titles or notes above the header row.
- Split column by delimiter: Use if the file imported as a single column due to delimiter mismatch.
// Conceptual example of promoting headers after removing extra lines (Power Query M pattern) let Source = Csv.Document(File.Contents("C:\Data\Sales.csv"),[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]), RemovedTopRows = Table.Skip(Source, 2), PromotedHeaders = Table.PromoteHeaders(RemovedTopRows, [PromoteAllScalars=true]) in PromotedHeadersExercise 3: Use the Folder Connector for Scalable Ingestion
The Folder connector is the right choice when you receive many files over time (daily exports, monthly snapshots, per-store files) and want Power BI to ingest them as one dataset. Instead of manually re-pointing to a new file each period, you drop the new file into the folder and refresh.
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
When to use Folder (and when not to)
- Use Folder when: you have recurring files with the same structure; you want to append them automatically; you expect the number of files to grow.
- Avoid Folder when: files have inconsistent columns; file formats vary unpredictably; you only ever have one file (Excel/CSV connector may be simpler).
Step-by-step: Connect to a folder and combine files
- Choose Get data > Folder.
- Select the folder path that will hold all current and future files.
- In the preview, review the file list (name, extension, date modified). Confirm only the intended files are present.
- Select Combine (or Combine & Transform) to generate an automated pattern that imports each file and appends them.
- In the combine experience, choose a sample file. Power BI uses it to infer the transformation steps that will be applied to all files.
- Open Power Query and inspect the generated queries (often a main query plus helper queries/functions). Validate that the transformations work for multiple files, not just the sample.
Practical guidance for maintainable folder ingestion
- Standardize file structure: enforce consistent column names, order, and data types across all files.
- Filter early: in the folder file list, filter by extension (e.g., only .csv) and optionally by filename pattern to exclude temporary files.
- Handle schema drift: if occasional extra columns appear, decide whether to ignore them or incorporate them; test refresh with older and newer files.
- Keep a “landing zone” folder: separate raw incoming files from curated files if you need validation before ingestion.
Example: Folder naming conventions that help refresh
- Good: Sales_2025-01.csv, Sales_2025-02.csv (predictable pattern)
- Risky: sales latest.csv (overwritten file can break auditability)
- Risky: random exports with different headers per month
Exercise 4: Connect to a Simple Database (SQL Server Example)
Databases are common in business settings and often provide more reliable refresh than files. With a database connector, you can select tables or write a query. You also may be able to choose Import or DirectQuery.
Step-by-step: Connect to SQL Server and preview tables
- Choose Get data > SQL Server (or another database connector you have access to).
- Enter the Server name and optionally the Database name.
- Choose Data connectivity mode: Import or DirectQuery (if available).
- Select OK, then authenticate using the appropriate method (Windows/Database/Microsoft account depending on environment).
- In Navigator, select the tables or views you need. Prefer views if your organization provides curated ones.
- Use the preview to confirm row/column shape and check for unexpected duplicates or missing keys.
- Choose Transform Data if you need shaping; otherwise Load.
Table vs. view vs. custom SQL (beginner guidance)
- Tables: simplest, but may include operational columns you don’t need.
- Views: often best for reporting because they can encapsulate business logic and stable schemas.
- Custom SQL: powerful but can reduce maintainability if not documented; use when you need filtering/joins that aren’t available as a view.
Import vs. DirectQuery: Conceptual Differences That Affect Modeling and Refresh
Import and DirectQuery are two ways Power BI can access data. Understanding the trade-offs helps you choose the right mode early, because it influences performance, refresh strategy, and some modeling capabilities.
Import mode (most common for beginners)
- How it works: data is copied into Power BI’s in-memory model during refresh.
- Refresh behavior: visuals query the in-memory model; data updates only when you refresh.
- Modeling flexibility: typically the most flexible for calculated tables/columns and complex transformations.
- Performance: often very fast for report interactions because queries hit the local model.
- Considerations: dataset size limits and refresh duration; you must schedule refresh to keep data current.
DirectQuery mode (when you need near-real-time or very large data)
- How it works: Power BI sends queries to the source database when users interact with visuals.
- Refresh behavior: there is no full data import; results depend on the current state of the source at query time (some metadata and caching still apply).
- Modeling impact: some transformations and DAX features may be limited; you often need to push logic into the database (views, SQL).
- Performance: depends heavily on the database performance, indexing, and network latency; poorly designed visuals can generate many queries.
- Considerations: requires a reliable, performant source; governance and permissions matter more because users are effectively querying the source.
Practical selection guidance
- Choose Import when: data fits comfortably, you can refresh on a schedule, and you want maximum modeling flexibility.
- Choose DirectQuery when: data is too large to import, you need near-real-time results, or policy requires data to remain in the database.
- If unsure: start with Import for learning and prototyping; revisit DirectQuery when requirements demand it.
Hands-on Checklist: Choosing the Right Connector
Use this checklist before you click “Get data” to avoid rework later.
File-based sources
- Single Excel workbook: Excel connector; prefer tables; ensure stable sheet/table names.
- Single CSV export: Text/CSV connector; confirm delimiter, encoding, and headers.
- Many recurring files: Folder connector; standardize schema; filter file list; combine/append.
Database sources
- Curated reporting tables/views exist: connect to those first.
- Need near-real-time: evaluate DirectQuery and test performance with realistic visuals.
- Need complex shaping: consider Import or push transformations into a view.
Exercise 5: Document Data Source Details for Maintainability
A report becomes hard to maintain when nobody knows where the data came from, which files are expected, or what credentials and refresh assumptions exist. Documenting data sources is a practical habit that prevents broken refreshes and reduces handover time.
What to document (minimum viable documentation)
- Source type: Excel / CSV / Folder / SQL Server / other.
- Location: full file path or folder path; for databases, server and database names.
- Objects used: workbook table names, sheet names, CSV filenames/patterns, database tables/views, or custom SQL name/purpose.
- Refresh method: Import (scheduled refresh frequency) or DirectQuery (expected query-time behavior).
- Assumptions: expected delimiter, header row position, encoding, required columns, and file naming convention.
- Owner/contact: who produces the files or manages the database objects.
Practical template you can copy into a project note
Data Source Record - Dataset: Sales Reporting Source Type: Folder (CSV) Location: \\FinanceShare\Exports\Sales\DailyFiles File Pattern: Sales_YYYY-MM-DD.csv Delimiter/Encoding: Comma, UTF-8 Headers: Row 1 is header Required Columns: OrderID (Text), OrderDate (Date), Amount (Decimal), StoreID (Text) Refresh Mode: Import Refresh Schedule: Daily 06:00 Assumptions: All files share identical schema; no blank header rows Owner: Finance Ops (financeops@company.com)Where to keep documentation
- In a shared team location (project wiki, shared document, or ticketing system).
- Alongside the PBIX in a repository folder as a simple text/markdown file.
- In Power Query query names and descriptions (use clear naming like Sales_Folder_Raw, Sales_Clean).