All courses > Technology and Programming > Data Science and Business Intelligence ::

Designing Dimension Tables: Attributes, Hierarchies, and Usable Business Context

Capítulo 5

Estimated reading time: 9 minutes

What Makes a Dimension “Good” for BI

A dimension table provides business context for measures in fact tables. A good dimension makes it easy for users to filter, group, and drill down without ambiguity. In practice, strong dimensions share three traits: rich descriptive attributes, stable keys, and intuitive hierarchies.

Rich descriptive attributes (the “language” of analysis)

Dimensions should contain the descriptive fields people naturally use in questions and report filters: names, categories, segments, statuses, flags, and derived groupings. The goal is to reduce the need for ad-hoc logic in reports by storing commonly used descriptors once, consistently.

Good: Product has brand, category, subcategory, size, color, lifecycle status, and launch date.
Weak: Product has only product_id and product_name, forcing analysts to join to multiple lookup tables or re-create groupings in every report.

Stable keys (the “anchor” for consistent joins)

Dimensions should be joined to facts using stable surrogate keys (typically integer keys generated in the warehouse). Natural keys from source systems (like customer_number or SKU) can change, be reused, or differ across systems. Surrogate keys protect reporting consistency and support change tracking in attributes over time.

Surrogate key: customer_key (warehouse-generated)
Business/natural key: customer_id (from CRM), may change or collide across sources

Intuitive hierarchies (the “paths” for slicing and drill-down)

Hierarchies are ordered levels that match how the business thinks and how users drill from summary to detail. Common patterns include:

Product: Category → Subcategory → SKU
Geography: Region → Country → State/Province → City
Organization: Division → Department → Team

Hierarchies should be explicit in the dimension (separate columns for each level) and named clearly so users can find them quickly in BI tools.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Organizing Attributes for Usability

Group attributes into logical sections

Even though a dimension is a single table, you can design it to feel organized by grouping columns conceptually. This improves discoverability in semantic layers and reduces confusion for report authors.

Identity: business key, name, display label, codes
Classification: segment, category/subcategory, type, tier
Geography: region, country, state, city, postal code
Status and lifecycle: active flag, start/end dates, churn reason
Operational metadata (optional): source system, record status, last updated timestamp

Naming conventions that match business language

Use names that are readable and consistent across dimensions. Avoid cryptic abbreviations and overloaded terms.

Prefer customer_segment over seg_cd.
Prefer product_subcategory over subcategory (more explicit in wide models).
Use consistent suffixes: _key for surrogate keys, _id for business keys, _name for labels, _flag for booleans.

Design for “Unknown”, “Other”, and missing values

Facts often arrive with incomplete dimension references (late-arriving data, missing codes, new members not yet loaded). To keep reporting stable and avoid dropping rows in joins, include standard placeholder members in each dimension.

Unknown: the source did not provide the value (e.g., customer not captured).
Not applicable: the attribute does not apply (e.g., campaign for organic traffic).
Other: value exists but is grouped for reporting (e.g., long tail of small channels).

Common practice is to reserve surrogate keys for these rows (for example, 0 = Unknown, -1 = Not applicable), but the exact values depend on your warehouse standards.

Make hierarchies consistent and drill-friendly

To support drill-down, store each hierarchy level as its own attribute and ensure values are complete and aligned. For example, if a product has a subcategory, it must also have a category. If a city is present, state and country should also be present.

Use a single, consistent hierarchy per analysis purpose (avoid mixing marketing and merchandising hierarchies in the same set of columns unless clearly separated).
Keep level names consistent: region, country, state_province, city.
Provide both codes and names when users need them: country_code and country_name.

Step-by-Step: Designing a Dimension Table

Step 1: List the questions the dimension must answer

Start from how the business slices metrics. Write down the filters and groupings users expect.

Customer: “Revenue by customer segment and region”, “Churn by acquisition channel”
Product: “Margin by category and brand”, “Units sold by size and color”
Campaign: “Conversions by channel and objective”, “ROI by audience and creative”
Date: “Sales by fiscal period”, “Week-over-week growth”

Step 2: Define the natural key and choose a surrogate key

Identify the best business identifier(s) from source systems and store them as attributes, but join facts using a surrogate key.

Customer: natural key might be CRM customer_id; store it as customer_id, join on customer_key.
Product: natural key might be SKU; store as sku, join on product_key.

Step 3: Draft attributes and hierarchies

Include attributes that drive filtering and grouping. Add hierarchy levels explicitly and ensure they are business-approved.

Product hierarchy: product_category, product_subcategory, sku
Geography hierarchy: region, country, state_province

Step 4: Standardize values and handle exceptions

Decide how to format names (case, punctuation), how to handle blanks, and how to map rare values to “Other” when needed for reporting.

Normalize country names (e.g., “United States” vs “USA”).
Define allowed values for status fields (e.g., Active/Inactive).
Create mapping rules for channel rollups (e.g., “Paid Social” includes multiple platform codes).

Step 5: Validate with real report scenarios

Test the dimension against typical BI actions: filtering, grouping, drill-down, and cross-report consistency.

Can a user filter by customer_segment and get expected totals?
Does drilling Product Category → Subcategory → SKU produce a sensible path?
Do two reports using “Region” produce the same grouping?

Example: Customer Dimension (Filtering and Consistent Segmentation)

A Customer dimension should support segmentation, geography slicing, and lifecycle analysis. It often becomes a shared definition across many reports, so naming and standardization matter.

dim_customer (example columns)  customer_key (PK)  customer_id (business key)  customer_name  customer_type (B2B/B2C)  customer_segment (e.g., Enterprise, SMB, Consumer)  acquisition_channel (e.g., Paid Search, Partner, Organic)  region  country  state_province  city  postal_code  industry  account_owner  is_active_flag  created_date  unknown_member_flag

How it supports BI:

Filtering: customer_segment = “SMB”, is_active_flag = true
Grouping: revenue by industry, churn by acquisition_channel
Drill-down: region → country → state_province
Consistency: “Enterprise” means the same segment across all dashboards because it is defined once in the dimension.

Practical notes for Customer attributes

Prefer a single “customer_name” used in reports; keep alternate names in separate columns if needed (legal_name, trading_name).
If multiple systems define segments differently, store both (e.g., segment_crm, segment_bi_standard) and make the standard one the default for reporting.
Include “Unknown” customer row to prevent losing fact rows when customer_id is missing.

Example: Product Dimension (Category Rollups and SKU Detail)

Product analysis typically requires both rollups (category performance) and detailed SKU-level views. The dimension should make those rollups obvious and stable.

dim_product (example columns)  product_key (PK)  sku (business key)  product_name  brand_name  product_category  product_subcategory  product_line  size  color  material  package_type  lifecycle_status (Active/Discontinued)  launch_date  is_bundle_flag

How it supports BI:

Filtering: brand_name = “Acme”, lifecycle_status = “Active”
Grouping: margin by product_category, units by size
Drill-down: product_category → product_subcategory → sku

Practical notes for Product hierarchies

Keep hierarchy levels business-friendly and stable; avoid using internal merchandising codes as the primary reporting fields unless paired with readable names.
If a product can belong to multiple categories, decide on a primary reporting hierarchy or model a separate bridge; do not silently duplicate products in the dimension.
Use “Other” subcategory when long-tail subcategories clutter reports, but keep the original subcategory in a separate attribute if analysts still need it.

Example: Campaign Dimension (Marketing Context and Attribution-Friendly Slicing)

A Campaign dimension provides context for marketing performance analysis. It should align with how marketing teams plan and evaluate initiatives: channel, objective, audience, and creative.

dim_campaign (example columns)  campaign_key (PK)  campaign_id (business key)  campaign_name  channel (Email, Paid Social, Search, Display, Partner)  channel_group (Paid, Owned, Earned)  objective (Acquisition, Retention, Awareness)  audience_segment  geo_target_region  start_date  end_date  creative_theme  landing_page_type  is_evergreen_flag  status (Planned/Active/Ended)

How it supports BI:

Filtering: channel_group = “Paid”, objective = “Acquisition”
Grouping: conversions by channel, ROI by audience_segment
Drill-down: channel_group → channel → campaign_name

Practical notes for Campaign usability

Store both granular and rolled-up channel fields (channel and channel_group) to avoid repeated mapping logic in reports.
Define “Not applicable” campaign for events that are not campaign-driven (e.g., organic signups) so facts can still join cleanly.
Use consistent naming rules for campaign_name (or store a cleaned display name) so report filters are readable.

Example: Date Dimension (Time Intelligence Without Repeated Logic)

A Date dimension is a usability accelerator: it centralizes calendar and fiscal logic so every report uses the same definitions for weeks, months, quarters, and holidays.

dim_date (example columns)  date_key (PK, e.g., 20260116)  date (YYYY-MM-DD)  day_of_week_name  day_of_week_number  week_start_date  week_of_year  month_number  month_name  month_start_date  quarter_number  year_number  fiscal_month_number  fiscal_quarter_number  fiscal_year_number  is_weekend_flag  is_holiday_flag

How it supports BI:

Filtering: fiscal_year_number = 2026, is_weekend_flag = false
Grouping: sales by month_name, pipeline by fiscal_quarter_number
Drill-down: year → quarter → month → date
Consistency: “Week” and “Fiscal Quarter” mean the same thing across all dashboards because they come from one shared table.

Practical notes for Date design

Include both calendar and fiscal attributes if the business uses both; keep names explicit (fiscal_year_number vs year_number).
Use a numeric date_key for efficient joins and a date column for readability.
Precompute commonly needed flags (is_month_end_flag, is_current_month_flag) only if they are stable and well-defined; avoid volatile “current” flags if your refresh cadence makes them confusing.

Making Dimensions Work Together Across Reports

Use shared definitions and avoid “report-only” logic

Dimensions are most valuable when they enforce consistent definitions across the organization. If a segmentation rule or channel rollup is important, encode it as a dimension attribute rather than re-implementing it in multiple reports.

Standardize segment labels (e.g., Enterprise/SMB) in dim_customer.
Standardize channel_group in dim_campaign.
Standardize region/country naming in all dimensions that carry geography.

Design for discoverability in BI tools

Users browse fields. Make the “right” fields easy to find and hard to misuse.

Prefer clear, business-facing names for default fields (product_category, customer_segment).
Keep technical fields (source_system_code, raw_status_code) but separate them by naming so they are not mistaken for reporting fields.
When two attributes sound similar, rename to clarify intent (e.g., customer_region vs sales_region if they differ).

Now answer the exercise about the content:

Which design choice best prevents losing fact rows when dimension references are missing or late-arriving while keeping BI reporting stable?

You are right! Congratulations, now go to the next page

You missed! Try again.

Placeholder members (e.g., “Unknown”, “Not applicable”) allow facts with missing or late dimension values to still join, preventing dropped rows and keeping totals consistent across reports.