SQL for Business Intelligence: Understanding Tables, Rows, and Business Entities

Capítulo 1

Estimated reading time: 7 minutes

+ Exercise

From Business Concepts to Database Structures

Business Intelligence (BI) questions are business questions first: “How many customers bought last month?”, “Which products drive revenue?”, “What is the average order value?”. In a database, those business concepts are stored in structures that let you count, filter, and connect facts reliably.

The core mapping looks like this:

  • Business entity (a “thing” you track) → Table (customers, orders, products)
  • One instance of that entityRow (one customer, one order, one product)
  • Attributes of the entityColumns (customer name, order date, product category)

What a Table Represents in BI

A table is usually a collection of the same type of business entity. Common BI entities include:

  • Customers: who buys
  • Orders: a purchase event (often the “header” of a transaction)
  • Order items / line items: what was bought (often the “details” of a transaction)
  • Products: what you sell

In BI, you’ll often hear two broad categories:

  • Dimension-like tables: descriptive context (customers, products, stores). They answer “who/what/where”.
  • Fact-like tables: measurable events (orders, order_items, payments). They answer “how many/how much/when”.

Rows and Columns: Reading BI Datasets Correctly

Rows: “One row equals one …”

To use a dataset safely, you must be able to finish this sentence: “One row equals one ____.” For example:

Continue in our app.
  • Listen to the audio with the screen off.
  • Earn a certificate upon completion.
  • Over 5000 courses for you to explore!
Or continue reading below...
Download App

Download the app

  • In customers, one row equals one customer.
  • In orders, one row equals one order.
  • In order_items, one row equals one product line on an order.

This matters because BI metrics are usually computed by aggregating rows. If you misunderstand what a row represents, you will overcount or undercount.

Columns: identifiers, descriptors, and metrics

Columns typically fall into three practical groups:

  • Identifiers: keys that uniquely identify rows or connect tables (e.g., customer_id, order_id, product_id).
  • Descriptors: labels and categories used for slicing (e.g., customer_segment, product_category, country).
  • Metrics: numeric values you aggregate (e.g., quantity, unit_price, order_total).

Primary Keys and Foreign Keys (Business Examples)

Primary key: the row’s unique business identity

A primary key (PK) is a column (or set of columns) whose value uniquely identifies each row in a table. In business terms: it’s the ID that lets you point to exactly one entity instance.

  • customers.customer_id uniquely identifies a customer.
  • orders.order_id uniquely identifies an order.
  • products.product_id uniquely identifies a product.

In BI, primary keys are critical because they define what “unique” means for that table. If a supposed primary key repeats, you can get duplicate joins and inflated metrics.

Foreign key: the business link to another entity

A foreign key (FK) is a column in one table that stores the primary key value from another table. In business terms: it’s how one entity references another.

  • orders.customer_id is a foreign key to customers.customer_id (each order belongs to a customer).
  • order_items.order_id is a foreign key to orders.order_id (each line item belongs to an order).
  • order_items.product_id is a foreign key to products.product_id (each line item refers to a product).

When you join tables in BI, you are usually matching foreign keys to primary keys.

Mini schema example

customers(customer_id PK, customer_name, segment, signup_date) orders(order_id PK, customer_id FK, order_date, status, order_total) order_items(order_item_id PK, order_id FK, product_id FK, quantity, unit_price) products(product_id PK, product_name, category)

Reading Relationships Like an ER Diagram (Plain Language)

Even without a picture, you can read an ER-style relationship description by focusing on “one” and “many”:

  • customers 1 → many orders: one customer can place many orders; each order belongs to one customer.
  • orders 1 → many order_items: one order can contain many line items; each line item belongs to one order.
  • products 1 → many order_items: one product can appear in many line items; each line item refers to one product.

These statements tell you how row counts behave when you combine tables.

Predicting row counts when combining tables

Before writing a query, predict what will happen to the number of rows after a join. This is one of the fastest ways to prevent incorrect BI results.

  • Joining “one” to “many” increases rows (or keeps them the same), because each “one” row can match multiple “many” rows.
  • Joining “many” to “one” keeps the “many” row count (assuming every “many” row finds a match), because each “many” row matches at most one “one” row.
  • Joining “many” to “many” can multiply rows and is often a warning sign unless you intend it (e.g., bridging tables, tags, multiple matches).

Step-by-step: predict the result size before you join

Use this checklist:

  • Step 1: Identify the grain of each table (one row equals one customer/order/line item/product).
  • Step 2: Identify the join keys (FK → PK).
  • Step 3: Determine relationship direction (1→many or many→1).
  • Step 4: Predict row count: will it stay the same, expand, or potentially multiply?

Example prediction:

  • orders has 10,000 rows (10,000 orders).
  • order_items has 35,000 rows (35,000 line items).
  • Joining orders to order_items on order_id produces about 35,000 rows (one per line item), because each order can match many line items.

Another prediction:

  • Joining order_items to products on product_id stays at about 35,000 rows, because each line item points to one product.

Common BI pitfall: duplicated metrics after a join

If you join a higher-level table to a lower-level table and then sum a higher-level metric, you can double-count.

Example: orders.order_total is at the order grain (one value per order). If you join orders to order_items, each order repeats across its line items. Summing order_total after that join will inflate revenue.

Safer pattern: aggregate at the correct grain first, or compute revenue from line items (e.g., quantity * unit_price) if you are intentionally working at the line-item grain.

Grain: The Level of Detail You Must Declare

Grain (also called “level of detail”) is the definition of what one row represents in your dataset or query result. In BI, you should decide the grain before adding metrics, because aggregation only makes sense relative to a grain.

Examples of grain

  • Customer grain: one row per customer (used for customer counts, segmentation, lifetime value summaries).
  • Order grain: one row per order (used for order counts, average order value, order status tracking).
  • Line-item grain: one row per product per order (used for product revenue, basket analysis, units sold).
  • Daily grain: one row per day (used for time series reporting).

Why BI queries must declare grain before adding metrics

Metrics like SUM(revenue), COUNT(*), and AVG(order_total) depend on what each row means. If your query accidentally changes grain (often by joining to a more detailed table), your metric changes meaning.

Practical rule: Choose the grain that matches the business question, then ensure every join and aggregation preserves that grain.

Step-by-step: set grain, then add metrics

Example business question: “Revenue by product category last month.” The intended grain is product category (one row per category in the output), but the computation typically starts at line-item grain because revenue is driven by items sold.

  • Step 1: Choose the computation grain: line items (one row per product per order).
  • Step 2: Compute the metric at that grain: quantity * unit_price per line item.
  • Step 3: Join only what you need for grouping: join products to get category.
  • Step 4: Aggregate to the reporting grain: sum revenue grouped by category.
SELECT p.category, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.product_id JOIN orders o ON oi.order_id = o.order_id WHERE o.order_date >= DATE '2025-01-01' AND o.order_date < DATE '2025-02-01' GROUP BY p.category;

Notice how the query’s grain is controlled: the calculation starts at line items, then rolls up to category. If instead you summed orders.order_total after joining to order_items, you would likely overcount.

Quick grain checks you can do before trusting a result

  • Check uniqueness: does the supposed primary key appear once in your result at the intended grain?
  • Check join direction: did you join from a higher grain to a lower grain without re-aggregating?
  • Check metric definition: is the metric stored at the same grain as your current dataset?

Now answer the exercise about the content:

Why can summing orders.order_total after joining orders to order_items lead to inflated revenue in a BI query?

You are right! Congratulations, now go to the next page

You missed! Try again.

Joining a higher-grain table (orders) to a lower-grain table (order_items) repeats each order across its line items. Summing an order-level metric like order_total after the join double-counts revenue unless you aggregate at the correct grain.

Next chapter

SQL for Business Intelligence: Writing Reliable SELECT Statements

Arrow Right Icon
Free Ebook cover SQL for Business Intelligence: From Zero to Confident Querying
9%

SQL for Business Intelligence: From Zero to Confident Querying

New course

11 pages

Download the app to earn free Certification and listen to the courses in the background, even with the screen off.