Why “reliable SELECT” matters in BI
In Business Intelligence, a SELECT statement is often the last step before data becomes a chart, KPI tile, or exported report. A reliable SELECT produces a clean, dashboard-ready column set: only the fields you need, consistently named, and shaped with business rules so downstream tools don’t have to guess.
This chapter focuses on building correct SELECT queries step by step: start with a clear skeleton, select only needed columns, name columns for reporting, add derived columns and CASE-based business logic, then validate the output with a small sample using LIMIT.
1) Start with a query skeleton
Begin with a minimal structure you can expand safely. Keep it readable and predictable: one clause per line, and a clear table alias.
SELECT -- columns go here
FROM sales.orders AS o; -- start from the primary table for the questionAt this stage, you are not trying to be “complete.” You are creating a stable place to add columns and logic without losing track of what the query returns.
Use a consistent naming convention early
Decide how you will name output columns for dashboards. Common, practical conventions include:
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
- snake_case for output columns (e.g., order_id, order_date, customer_segment)
- prefixes for measures (e.g., revenue, gross_margin) and flags (e.g., is_returned)
- date suffixes for clarity (e.g., created_at, shipped_at)
2) Select only needed columns
Dashboards work best when the dataset is narrow and intentional. Avoid SELECT * in BI extracts because it:
- pulls unnecessary data (slower queries, larger extracts)
- creates unstable schemas (new columns appear and break reports)
- makes it harder to review correctness
Instead, list the exact columns required for the metric or visualization.
SELECT
o.order_id,
o.order_date,
o.customer_id,
o.status,
o.total_amount
FROM sales.orders AS o;Practical approach: write down what the dashboard needs (dimensions and measures), then map each item to a column or derived expression. If a column is not used by a chart, filter, or drill-through, don’t include it.
Step-by-step checklist for column selection
- Identify the grain (one row per order, per customer, per day, etc.).
- List required dimensions (e.g., order_date, status, customer_id).
- List required measures (e.g., total_amount).
- Remove “nice-to-have” columns until a stakeholder asks for them.
3) Name columns for reporting (aliases)
BI outputs should be self-explanatory. Use aliases to make column names consistent and readable, especially when source systems use cryptic names or mixed casing.
SELECT
o.order_id AS order_id,
o.order_date AS order_date,
o.customer_id AS customer_id,
o.status AS order_status,
o.total_amount AS order_total
FROM sales.orders AS o;Even if the alias matches the original name, being explicit can help when you later refactor (joins, derived columns, or renamed source fields). The key is consistency: the same concept should have the same output name across datasets.
Derived columns with expressions
Derived columns turn raw fields into analysis-ready values. Common examples include arithmetic, date transformations, and safe ratios. Keep expressions simple and name them clearly.
SELECT
o.order_id AS order_id,
o.order_date AS order_date,
o.customer_id AS customer_id,
o.total_amount AS order_total,
o.tax_amount AS tax_amount,
(o.total_amount - o.tax_amount) AS net_revenue
FROM sales.orders AS o;If your BI tool expects a specific set of measures, derived columns are often preferable to pushing calculations into the dashboard layer, because they become reusable and testable.
Use CASE for business rules
CASE is the standard way to encode business logic directly in SQL output. This is especially useful for:
- bucketizing customers (e.g., by lifetime value)
- labeling statuses into business-friendly groups
- creating flags used for filtering (e.g., is_completed)
Example: bucketize customers by spend
Suppose you have a customer-level table with lifetime spend. You can create a segment column that dashboards can use as a dimension.
SELECT
c.customer_id AS customer_id,
c.lifetime_spend AS lifetime_spend,
CASE
WHEN c.lifetime_spend >= 5000 THEN 'vip'
WHEN c.lifetime_spend >= 1000 THEN 'high_value'
WHEN c.lifetime_spend >= 200 THEN 'mid_value'
ELSE 'low_value'
END AS customer_segment
FROM sales.customers AS c;Guidelines for reliable CASE logic:
- Order conditions from most specific to least specific.
- Ensure ranges don’t overlap unintentionally.
- Always include an ELSE to avoid unexpected NULL categories.
- Use stable labels (avoid changing spelling/casing across queries).
Example: label order status for reporting
Operational statuses can be too granular for BI. Create a reporting status that groups them into business-friendly categories.
SELECT
o.order_id AS order_id,
o.order_date AS order_date,
o.status AS order_status_raw,
CASE
WHEN o.status IN ('paid', 'shipped', 'delivered') THEN 'completed'
WHEN o.status IN ('pending', 'processing') THEN 'in_progress'
WHEN o.status IN ('canceled', 'refunded') THEN 'canceled_or_refunded'
ELSE 'other'
END AS order_status
FROM sales.orders AS o;This produces a clean dimension (order_status) that is stable for dashboards, while still keeping the raw status available if needed (order_status_raw). If you only want one status column in the final dataset, keep the reporting version and drop the raw one.
Example: dashboard-ready order extract (clean column set)
Combine the techniques: select only needed fields, alias consistently, add derived measures, and encode business rules with CASE.
SELECT
o.order_id AS order_id,
o.order_date AS order_date,
o.customer_id AS customer_id,
CASE
WHEN o.status IN ('paid', 'shipped', 'delivered') THEN 'completed'
WHEN o.status IN ('pending', 'processing') THEN 'in_progress'
WHEN o.status IN ('canceled', 'refunded') THEN 'canceled_or_refunded'
ELSE 'other'
END AS order_status,
o.total_amount AS order_total,
o.tax_amount AS tax_amount,
(o.total_amount - o.tax_amount) AS net_revenue
FROM sales.orders AS o;Notice the output is “dashboard-shaped”: stable names, business-friendly categories, and measures ready to aggregate.
4) Validate output with a small LIMIT sample
Before you hand a dataset to a dashboard (or build on it with joins and filters), validate the output quickly. LIMIT helps you inspect real rows without scanning the full table.
SELECT
o.order_id AS order_id,
o.order_date AS order_date,
o.customer_id AS customer_id,
CASE
WHEN o.status IN ('paid', 'shipped', 'delivered') THEN 'completed'
WHEN o.status IN ('pending', 'processing') THEN 'in_progress'
WHEN o.status IN ('canceled', 'refunded') THEN 'canceled_or_refunded'
ELSE 'other'
END AS order_status,
o.total_amount AS order_total,
o.tax_amount AS tax_amount,
(o.total_amount - o.tax_amount) AS net_revenue
FROM sales.orders AS o
LIMIT 25;What to check in the sample
- Column set: Are there any extra columns you don’t need? Are any required columns missing?
- Naming: Do column names match your reporting convention (snake_case, consistent prefixes)?
- Business rules: Do CASE labels look correct? Any unexpected NULLs or “other” values that should be categorized?
- Derived values: Do calculations (net_revenue) look plausible for a few known orders?
- Data types: Do numeric columns look numeric, and date columns look like dates (not strings)?
Practical workflow: build reliable SELECTs in minutes
- Write the skeleton: SELECT … FROM …
- Add only the columns needed for the dashboard question.
- Alias every output column into a consistent reporting schema.
- Add derived columns for reusable measures.
- Add CASE columns for stable business categories.
- Run with LIMIT and visually inspect the result before expanding.