From Business Concepts to Database Structures
Business Intelligence (BI) questions are business questions first: “How many customers bought last month?”, “Which products drive revenue?”, “What is the average order value?”. In a database, those business concepts are stored in structures that let you count, filter, and connect facts reliably.
The core mapping looks like this:
- Business entity (a “thing” you track) → Table (customers, orders, products)
- One instance of that entity → Row (one customer, one order, one product)
- Attributes of the entity → Columns (customer name, order date, product category)
What a Table Represents in BI
A table is usually a collection of the same type of business entity. Common BI entities include:
- Customers: who buys
- Orders: a purchase event (often the “header” of a transaction)
- Order items / line items: what was bought (often the “details” of a transaction)
- Products: what you sell
In BI, you’ll often hear two broad categories:
- Dimension-like tables: descriptive context (customers, products, stores). They answer “who/what/where”.
- Fact-like tables: measurable events (orders, order_items, payments). They answer “how many/how much/when”.
Rows and Columns: Reading BI Datasets Correctly
Rows: “One row equals one …”
To use a dataset safely, you must be able to finish this sentence: “One row equals one ____.” For example:
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
- In
customers, one row equals one customer. - In
orders, one row equals one order. - In
order_items, one row equals one product line on an order.
This matters because BI metrics are usually computed by aggregating rows. If you misunderstand what a row represents, you will overcount or undercount.
Columns: identifiers, descriptors, and metrics
Columns typically fall into three practical groups:
- Identifiers: keys that uniquely identify rows or connect tables (e.g.,
customer_id,order_id,product_id). - Descriptors: labels and categories used for slicing (e.g.,
customer_segment,product_category,country). - Metrics: numeric values you aggregate (e.g.,
quantity,unit_price,order_total).
Primary Keys and Foreign Keys (Business Examples)
Primary key: the row’s unique business identity
A primary key (PK) is a column (or set of columns) whose value uniquely identifies each row in a table. In business terms: it’s the ID that lets you point to exactly one entity instance.
customers.customer_iduniquely identifies a customer.orders.order_iduniquely identifies an order.products.product_iduniquely identifies a product.
In BI, primary keys are critical because they define what “unique” means for that table. If a supposed primary key repeats, you can get duplicate joins and inflated metrics.
Foreign key: the business link to another entity
A foreign key (FK) is a column in one table that stores the primary key value from another table. In business terms: it’s how one entity references another.
orders.customer_idis a foreign key tocustomers.customer_id(each order belongs to a customer).order_items.order_idis a foreign key toorders.order_id(each line item belongs to an order).order_items.product_idis a foreign key toproducts.product_id(each line item refers to a product).
When you join tables in BI, you are usually matching foreign keys to primary keys.
Mini schema example
customers(customer_id PK, customer_name, segment, signup_date) orders(order_id PK, customer_id FK, order_date, status, order_total) order_items(order_item_id PK, order_id FK, product_id FK, quantity, unit_price) products(product_id PK, product_name, category)Reading Relationships Like an ER Diagram (Plain Language)
Even without a picture, you can read an ER-style relationship description by focusing on “one” and “many”:
- customers 1 → many orders: one customer can place many orders; each order belongs to one customer.
- orders 1 → many order_items: one order can contain many line items; each line item belongs to one order.
- products 1 → many order_items: one product can appear in many line items; each line item refers to one product.
These statements tell you how row counts behave when you combine tables.
Predicting row counts when combining tables
Before writing a query, predict what will happen to the number of rows after a join. This is one of the fastest ways to prevent incorrect BI results.
- Joining “one” to “many” increases rows (or keeps them the same), because each “one” row can match multiple “many” rows.
- Joining “many” to “one” keeps the “many” row count (assuming every “many” row finds a match), because each “many” row matches at most one “one” row.
- Joining “many” to “many” can multiply rows and is often a warning sign unless you intend it (e.g., bridging tables, tags, multiple matches).
Step-by-step: predict the result size before you join
Use this checklist:
- Step 1: Identify the grain of each table (one row equals one customer/order/line item/product).
- Step 2: Identify the join keys (FK → PK).
- Step 3: Determine relationship direction (1→many or many→1).
- Step 4: Predict row count: will it stay the same, expand, or potentially multiply?
Example prediction:
ordershas 10,000 rows (10,000 orders).order_itemshas 35,000 rows (35,000 line items).- Joining
orderstoorder_itemsonorder_idproduces about 35,000 rows (one per line item), because each order can match many line items.
Another prediction:
- Joining
order_itemstoproductsonproduct_idstays at about 35,000 rows, because each line item points to one product.
Common BI pitfall: duplicated metrics after a join
If you join a higher-level table to a lower-level table and then sum a higher-level metric, you can double-count.
Example: orders.order_total is at the order grain (one value per order). If you join orders to order_items, each order repeats across its line items. Summing order_total after that join will inflate revenue.
Safer pattern: aggregate at the correct grain first, or compute revenue from line items (e.g., quantity * unit_price) if you are intentionally working at the line-item grain.
Grain: The Level of Detail You Must Declare
Grain (also called “level of detail”) is the definition of what one row represents in your dataset or query result. In BI, you should decide the grain before adding metrics, because aggregation only makes sense relative to a grain.
Examples of grain
- Customer grain: one row per customer (used for customer counts, segmentation, lifetime value summaries).
- Order grain: one row per order (used for order counts, average order value, order status tracking).
- Line-item grain: one row per product per order (used for product revenue, basket analysis, units sold).
- Daily grain: one row per day (used for time series reporting).
Why BI queries must declare grain before adding metrics
Metrics like SUM(revenue), COUNT(*), and AVG(order_total) depend on what each row means. If your query accidentally changes grain (often by joining to a more detailed table), your metric changes meaning.
Practical rule: Choose the grain that matches the business question, then ensure every join and aggregation preserves that grain.
Step-by-step: set grain, then add metrics
Example business question: “Revenue by product category last month.” The intended grain is product category (one row per category in the output), but the computation typically starts at line-item grain because revenue is driven by items sold.
- Step 1: Choose the computation grain: line items (one row per product per order).
- Step 2: Compute the metric at that grain:
quantity * unit_priceper line item. - Step 3: Join only what you need for grouping: join
productsto getcategory. - Step 4: Aggregate to the reporting grain: sum revenue grouped by category.
SELECT p.category, SUM(oi.quantity * oi.unit_price) AS revenue FROM order_items oi JOIN products p ON oi.product_id = p.product_id JOIN orders o ON oi.order_id = o.order_id WHERE o.order_date >= DATE '2025-01-01' AND o.order_date < DATE '2025-02-01' GROUP BY p.category;Notice how the query’s grain is controlled: the calculation starts at line items, then rolls up to category. If instead you summed orders.order_total after joining to order_items, you would likely overcount.
Quick grain checks you can do before trusting a result
- Check uniqueness: does the supposed primary key appear once in your result at the intended grain?
- Check join direction: did you join from a higher grain to a lower grain without re-aggregating?
- Check metric definition: is the metric stored at the same grain as your current dataset?