SQL for Business Intelligence: Filtering Data with WHERE and Safe Conditions

Capítulo 3

Estimated reading time: 9 minutes

+ Exercise

Why filtering matters in BI

In Business Intelligence, most queries answer a question about a specific reporting period (e.g., last month, quarter-to-date) and a specific business segment (e.g., region, channel, customer tier). Filtering is how you narrow a dataset to the exact slice that matches the business definition. In SQL, filtering rows is primarily done with WHERE, which keeps only rows where the condition evaluates to true.

Core WHERE patterns for reporting periods and segments

Comparison operators

Use comparison operators to filter numeric, text, and date/time columns. Common operators: =, <> (not equal), >, >=, <, <=.

SELECT order_id, order_date, total_amount, region
FROM orders
WHERE region = 'West'
  AND total_amount >= 100;

BI tip: be explicit about business definitions. “High-value orders” might mean >= 100 in one dashboard and > 100 in another; that one character changes totals.

Combining conditions with AND/OR (and using parentheses)

AND requires all conditions to be true; OR requires at least one. When mixing them, use parentheses to match the intended logic.

SELECT customer_id, segment, region
FROM customers
WHERE segment = 'SMB'
  AND (region = 'West' OR region = 'South');

Without parentheses, SQL may evaluate AND before OR, producing a different result than you intended.

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

IN for business segments and lists

IN is a clean way to filter to a set of allowed values (regions, channels, product categories). It is usually clearer than a long chain of OR.

SELECT order_id, channel, order_date
FROM orders
WHERE channel IN ('Online', 'Partner', 'Retail');

Practical use: “Include only core channels” or “Exclude test stores” (with NOT IN).

SELECT store_id, store_name
FROM stores
WHERE store_id NOT IN (9999, 9998);

BETWEEN for ranges (use carefully with dates/timestamps)

BETWEEN is inclusive on both ends: BETWEEN a AND b includes a and b. This is great for numeric ranges (e.g., age bands, price bands), but can be tricky for timestamps.

SELECT product_id, price
FROM products
WHERE price BETWEEN 10 AND 25;

For dates, BETWEEN can work if your column is a date (no time component). If your column is a timestamp, BETWEEN '2025-01-01' AND '2025-01-31' may exclude records later on the end date depending on how the database interprets the literal (often midnight). A safer BI pattern is a half-open interval: start inclusive, end exclusive.

SELECT order_id, order_ts
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-02-01';

LIKE for pattern matching (and when not to use it)

LIKE matches text patterns using wildcards: % for any length, _ for a single character. Use it for “starts with”, “contains”, or standardized codes.

SELECT customer_id, email
FROM customers
WHERE email LIKE '%@company.com';
SELECT ticket_id, issue_code
FROM support_tickets
WHERE issue_code LIKE 'BILLING_%';

BI caution: LIKE '%something%' can be slow on large tables and may match more than expected (e.g., “pro” matches “product”, “improve”). Prefer exact matches or standardized dimensions when possible.

Date filtering patterns used in reporting

Filter a single day

If the column is a date:

SELECT order_id, order_date
FROM orders
WHERE order_date = '2025-01-15';

If the column is a timestamp, use a day range (half-open interval):

SELECT order_id, order_ts
FROM orders
WHERE order_ts >= '2025-01-15'
  AND order_ts <  '2025-01-16';

Month-to-date (MTD) and last full month

MTD typically means from the first day of the current month up to “today” (or up to the latest loaded date in the warehouse). The exact functions vary by database, but the logic is always: start-of-month inclusive, end boundary exclusive or inclusive depending on your definition.

SELECT order_id, order_ts
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-01-16';

Last full month: start at the first day of last month, end at the first day of this month.

SELECT order_id, order_ts
FROM orders
WHERE order_ts >= '2024-12-01'
  AND order_ts <  '2025-01-01';

BI tip: avoid filtering by formatted strings like WHERE TO_CHAR(order_ts,'YYYY-MM') = '2025-01' because it can prevent efficient filtering and may behave differently across systems. Prefer range conditions on the raw date/timestamp.

Quarter-to-date (QTD) and rolling windows

QTD is the same idea: start at quarter start, end at “today” (or latest loaded date). Rolling windows are “last 7 days”, “last 30 days”, etc. The safest mental model is always: define the start boundary and end boundary explicitly.

SELECT order_id, order_ts
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-01-16';

For a rolling 30-day window ending at a specific anchor date, translate it into boundaries.

SELECT order_id, order_ts
FROM orders
WHERE order_ts >= '2024-12-17'
  AND order_ts <  '2025-01-16';

Null-safe filtering (and common mistakes)

How NULL behaves in WHERE

NULL means “unknown / missing”. It is not equal to anything, not even another NULL. This affects filtering: comparisons like = or <> do not work the way many people expect.

Correct: IS NULL and IS NOT NULL

SELECT customer_id, churn_date
FROM customers
WHERE churn_date IS NULL;
SELECT customer_id, churn_date
FROM customers
WHERE churn_date IS NOT NULL;

Typical BI uses: “active customers” might be churn_date IS NULL, “completed orders” might be shipped_at IS NOT NULL.

Common mistake: using '=' with NULL

This returns zero rows (or unexpected results) because NULL is not comparable with =.

SELECT customer_id
FROM customers
WHERE churn_date = NULL;

Use IS NULL instead.

Common mistake: NOT IN with NULLs

NOT IN can behave unexpectedly if the list (or subquery) contains NULL. In many SQL engines, if the set contains NULL, the comparison becomes unknown for many rows, and you may get fewer rows than expected.

Safer patterns include filtering NULLs out of the set, or using a different approach. If you must use NOT IN, ensure the list has no NULLs.

SELECT order_id, customer_id
FROM orders
WHERE customer_id NOT IN (1001, 1002, 1003);

If the exclusion list comes from a table, ensure the key is non-null in that table before using it in a subquery.

Step-by-step practice pattern for safe filtering

Pattern overview

  • 1) Write the business question in plain language.
  • 2) Translate it into precise filter conditions (period + segment + status rules).
  • 3) Validate with row counts and spot-check records (boundary dates, segments, nulls).

Example 1: Revenue for a segment in a reporting month

1) Business question: “What were total paid order amounts for SMB customers in the West region during January 2025?”

2) Translate into conditions:

  • Period: January 2025 (timestamp-safe): order_ts >= '2025-01-01' and order_ts < '2025-02-01'
  • Segment: customer_segment = 'SMB'
  • Region: region = 'West'
  • Paid orders only: paid_at IS NOT NULL
SELECT SUM(total_amount) AS total_paid_amount
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-02-01'
  AND customer_segment = 'SMB'
  AND region = 'West'
  AND paid_at IS NOT NULL;

3) Validate:

  • Row count check: how many orders are included?
  • Spot-check boundary records: earliest and latest timestamps included.
  • Null check: confirm paid_at is truly populated for included rows.
SELECT COUNT(*) AS orders_in_scope
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-02-01'
  AND customer_segment = 'SMB'
  AND region = 'West'
  AND paid_at IS NOT NULL;
SELECT order_id, order_ts, total_amount, paid_at
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-02-01'
  AND customer_segment = 'SMB'
  AND region = 'West'
  AND paid_at IS NOT NULL
ORDER BY order_ts ASC
LIMIT 10;
SELECT order_id, order_ts, total_amount, paid_at
FROM orders
WHERE order_ts >= '2025-01-01'
  AND order_ts <  '2025-02-01'
  AND customer_segment = 'SMB'
  AND region = 'West'
  AND paid_at IS NOT NULL
ORDER BY order_ts DESC
LIMIT 10;

Example 2: Orders for specific channels, excluding missing customer IDs

1) Business question: “How many orders came from Online or Partner channels last week, excluding orders that are missing a customer ID?”

2) Translate into conditions:

  • Period: define last week boundaries (example dates shown)
  • Channels: channel IN ('Online','Partner')
  • Data quality rule: customer_id IS NOT NULL
SELECT COUNT(*) AS orders_count
FROM orders
WHERE order_ts >= '2025-01-06'
  AND order_ts <  '2025-01-13'
  AND channel IN ('Online', 'Partner')
  AND customer_id IS NOT NULL;

3) Validate: compare counts with and without the data quality filter to understand impact.

SELECT
  COUNT(*) AS all_orders,
  SUM(CASE WHEN customer_id IS NULL THEN 1 ELSE 0 END) AS missing_customer_id
FROM orders
WHERE order_ts >= '2025-01-06'
  AND order_ts <  '2025-01-13'
  AND channel IN ('Online', 'Partner');

Example 3: Finding records with text patterns (and verifying matches)

1) Business question: “Which customers have a company email domain, and are any of those emails missing?”

2) Translate into conditions:

  • Company domain: email LIKE '%@company.com'
  • Missing email: email IS NULL (separate check)
SELECT customer_id, email
FROM customers
WHERE email LIKE '%@company.com';

3) Validate: check how many emails are null overall, and spot-check a few matches to ensure the pattern is correct (e.g., subdomains, uppercase/lowercase behavior depends on database collation).

SELECT
  COUNT(*) AS total_customers,
  SUM(CASE WHEN email IS NULL THEN 1 ELSE 0 END) AS null_emails
FROM customers;
SELECT customer_id, email
FROM customers
WHERE email LIKE '%@company.com'
LIMIT 20;

Checklist of common WHERE mistakes to avoid in BI queries

  • Using = NULL or <> NULL instead of IS NULL / IS NOT NULL.
  • Using BETWEEN on timestamps without considering the time component on the end boundary.
  • Mixing AND and OR without parentheses, changing the intended segment definition.
  • Filtering dates by formatting/casting the column to text, instead of using range conditions.
  • Using LIKE '%...%' when an exact match or standardized dimension value exists.
  • Forgetting to validate filters with counts and boundary spot-checks, leading to silent misreporting.

Now answer the exercise about the content:

When filtering a timestamp column for all orders in January 2025, which WHERE pattern is safest to avoid accidentally excluding records late on the last day?

You are right! Congratulations, now go to the next page

You missed! Try again.

For timestamps, BETWEEN can miss records on the end date due to time components. A safer pattern is start inclusive and end exclusive, e.g., >= '2025-01-01' and < '2025-02-01'.

Next chapter

SQL for Business Intelligence: Sorting and Limiting Results for QA and Insights

Arrow Right Icon
Free Ebook cover SQL for Business Intelligence: From Zero to Confident Querying
27%

SQL for Business Intelligence: From Zero to Confident Querying

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.