SQL for Business Intelligence: Writing Reliable SELECT Statements

Capítulo 2

Estimated reading time: 7 minutes

+ Exercise

Why “reliable SELECT” matters in BI

In Business Intelligence, a SELECT statement is often the last step before data becomes a chart, KPI tile, or exported report. A reliable SELECT produces a clean, dashboard-ready column set: only the fields you need, consistently named, and shaped with business rules so downstream tools don’t have to guess.

This chapter focuses on building correct SELECT queries step by step: start with a clear skeleton, select only needed columns, name columns for reporting, add derived columns and CASE-based business logic, then validate the output with a small sample using LIMIT.

1) Start with a query skeleton

Begin with a minimal structure you can expand safely. Keep it readable and predictable: one clause per line, and a clear table alias.

SELECT  -- columns go here
FROM sales.orders AS o;  -- start from the primary table for the question

At this stage, you are not trying to be “complete.” You are creating a stable place to add columns and logic without losing track of what the query returns.

Use a consistent naming convention early

Decide how you will name output columns for dashboards. Common, practical conventions include:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • snake_case for output columns (e.g., order_id, order_date, customer_segment)
  • prefixes for measures (e.g., revenue, gross_margin) and flags (e.g., is_returned)
  • date suffixes for clarity (e.g., created_at, shipped_at)

2) Select only needed columns

Dashboards work best when the dataset is narrow and intentional. Avoid SELECT * in BI extracts because it:

  • pulls unnecessary data (slower queries, larger extracts)
  • creates unstable schemas (new columns appear and break reports)
  • makes it harder to review correctness

Instead, list the exact columns required for the metric or visualization.

SELECT
  o.order_id,
  o.order_date,
  o.customer_id,
  o.status,
  o.total_amount
FROM sales.orders AS o;

Practical approach: write down what the dashboard needs (dimensions and measures), then map each item to a column or derived expression. If a column is not used by a chart, filter, or drill-through, don’t include it.

Step-by-step checklist for column selection

  • Identify the grain (one row per order, per customer, per day, etc.).
  • List required dimensions (e.g., order_date, status, customer_id).
  • List required measures (e.g., total_amount).
  • Remove “nice-to-have” columns until a stakeholder asks for them.

3) Name columns for reporting (aliases)

BI outputs should be self-explanatory. Use aliases to make column names consistent and readable, especially when source systems use cryptic names or mixed casing.

SELECT
  o.order_id            AS order_id,
  o.order_date          AS order_date,
  o.customer_id         AS customer_id,
  o.status              AS order_status,
  o.total_amount        AS order_total
FROM sales.orders AS o;

Even if the alias matches the original name, being explicit can help when you later refactor (joins, derived columns, or renamed source fields). The key is consistency: the same concept should have the same output name across datasets.

Derived columns with expressions

Derived columns turn raw fields into analysis-ready values. Common examples include arithmetic, date transformations, and safe ratios. Keep expressions simple and name them clearly.

SELECT
  o.order_id                               AS order_id,
  o.order_date                             AS order_date,
  o.customer_id                            AS customer_id,
  o.total_amount                           AS order_total,
  o.tax_amount                             AS tax_amount,
  (o.total_amount - o.tax_amount)          AS net_revenue
FROM sales.orders AS o;

If your BI tool expects a specific set of measures, derived columns are often preferable to pushing calculations into the dashboard layer, because they become reusable and testable.

Use CASE for business rules

CASE is the standard way to encode business logic directly in SQL output. This is especially useful for:

  • bucketizing customers (e.g., by lifetime value)
  • labeling statuses into business-friendly groups
  • creating flags used for filtering (e.g., is_completed)

Example: bucketize customers by spend

Suppose you have a customer-level table with lifetime spend. You can create a segment column that dashboards can use as a dimension.

SELECT
  c.customer_id                    AS customer_id,
  c.lifetime_spend                 AS lifetime_spend,
  CASE
    WHEN c.lifetime_spend >= 5000 THEN 'vip'
    WHEN c.lifetime_spend >= 1000 THEN 'high_value'
    WHEN c.lifetime_spend >= 200  THEN 'mid_value'
    ELSE 'low_value'
  END                              AS customer_segment
FROM sales.customers AS c;

Guidelines for reliable CASE logic:

  • Order conditions from most specific to least specific.
  • Ensure ranges don’t overlap unintentionally.
  • Always include an ELSE to avoid unexpected NULL categories.
  • Use stable labels (avoid changing spelling/casing across queries).

Example: label order status for reporting

Operational statuses can be too granular for BI. Create a reporting status that groups them into business-friendly categories.

SELECT
  o.order_id                         AS order_id,
  o.order_date                       AS order_date,
  o.status                           AS order_status_raw,
  CASE
    WHEN o.status IN ('paid', 'shipped', 'delivered') THEN 'completed'
    WHEN o.status IN ('pending', 'processing')        THEN 'in_progress'
    WHEN o.status IN ('canceled', 'refunded')         THEN 'canceled_or_refunded'
    ELSE 'other'
  END                                AS order_status
FROM sales.orders AS o;

This produces a clean dimension (order_status) that is stable for dashboards, while still keeping the raw status available if needed (order_status_raw). If you only want one status column in the final dataset, keep the reporting version and drop the raw one.

Example: dashboard-ready order extract (clean column set)

Combine the techniques: select only needed fields, alias consistently, add derived measures, and encode business rules with CASE.

SELECT
  o.order_id                                         AS order_id,
  o.order_date                                       AS order_date,
  o.customer_id                                      AS customer_id,
  CASE
    WHEN o.status IN ('paid', 'shipped', 'delivered') THEN 'completed'
    WHEN o.status IN ('pending', 'processing')        THEN 'in_progress'
    WHEN o.status IN ('canceled', 'refunded')         THEN 'canceled_or_refunded'
    ELSE 'other'
  END                                                AS order_status,
  o.total_amount                                     AS order_total,
  o.tax_amount                                       AS tax_amount,
  (o.total_amount - o.tax_amount)                    AS net_revenue
FROM sales.orders AS o;

Notice the output is “dashboard-shaped”: stable names, business-friendly categories, and measures ready to aggregate.

4) Validate output with a small LIMIT sample

Before you hand a dataset to a dashboard (or build on it with joins and filters), validate the output quickly. LIMIT helps you inspect real rows without scanning the full table.

SELECT
  o.order_id                                         AS order_id,
  o.order_date                                       AS order_date,
  o.customer_id                                      AS customer_id,
  CASE
    WHEN o.status IN ('paid', 'shipped', 'delivered') THEN 'completed'
    WHEN o.status IN ('pending', 'processing')        THEN 'in_progress'
    WHEN o.status IN ('canceled', 'refunded')         THEN 'canceled_or_refunded'
    ELSE 'other'
  END                                                AS order_status,
  o.total_amount                                     AS order_total,
  o.tax_amount                                       AS tax_amount,
  (o.total_amount - o.tax_amount)                    AS net_revenue
FROM sales.orders AS o
LIMIT 25;

What to check in the sample

  • Column set: Are there any extra columns you don’t need? Are any required columns missing?
  • Naming: Do column names match your reporting convention (snake_case, consistent prefixes)?
  • Business rules: Do CASE labels look correct? Any unexpected NULLs or “other” values that should be categorized?
  • Derived values: Do calculations (net_revenue) look plausible for a few known orders?
  • Data types: Do numeric columns look numeric, and date columns look like dates (not strings)?

Practical workflow: build reliable SELECTs in minutes

  • Write the skeleton: SELECT … FROM …
  • Add only the columns needed for the dashboard question.
  • Alias every output column into a consistent reporting schema.
  • Add derived columns for reusable measures.
  • Add CASE columns for stable business categories.
  • Run with LIMIT and visually inspect the result before expanding.

Now answer the exercise about the content:

When preparing a dashboard-ready dataset, which approach best helps ensure a reliable SELECT output?

You are right! Congratulations, now go to the next page

You missed! Try again.

A reliable BI SELECT is narrow and intentional, uses consistent aliases, encodes reusable measures and business rules (e.g., CASE), and is checked quickly with LIMIT to confirm columns, labels, and calculations.

Next chapter

SQL for Business Intelligence: Filtering Data with WHERE and Safe Conditions

Arrow Right Icon
Free Ebook cover SQL for Business Intelligence: From Zero to Confident Querying
18%

SQL for Business Intelligence: From Zero to Confident Querying

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.