All courses > Technology and Programming > Databases ::

Avoiding Common Modeling Mistakes: Practical Fixes for Cleaner Schemas

Capítulo 8

Estimated reading time: 8 minutes

1) Repeating Groups and Multi-Valued Fields

What the mistake looks like: putting “many values” into one row by repeating columns (Phone1, Phone2, Phone3) or stuffing a list into a single field (comma-separated tags).

Flawed mini-schema

Customer( CustomerID PK, Name, Phone1, Phone2, Phone3 )

-- or
Customer( CustomerID PK, Name, Phones )  -- e.g., "555-0101,555-0102"

Problems it creates

Update anomalies: changing a phone number means hunting for it in multiple columns or parsing strings; duplicates are easy to introduce.
Unclear meaning: is Phone2 “secondary”, “work”, “most recent”, or just “the second one we happened to store”?
Hard queries: finding customers with a given phone requires OR across columns or string matching; indexing and uniqueness checks become unreliable.
Artificial limits: you silently cap the number of phones at 3 (or whatever you guessed).

Refactor: step-by-step to a clean design

Step 1: Identify the repeating group. “Phone” is a repeatable fact about a customer.

Step 2: Move the repeating group to its own table. Each phone becomes its own row.

Step 3: Add meaning if needed. If the business cares about type (mobile, home) or “primary”, model it explicitly.

Corrected design

Customer( CustomerID PK, Name )

CustomerPhone(
  CustomerID FK, 
  PhoneNumber,
  PhoneType,        -- optional: 'mobile','home','work'
  IsPrimary,        -- optional
  PRIMARY KEY (CustomerID, PhoneNumber)
)

Practical example query becomes straightforward:

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

-- Find the customer who has a given phone number
SELECT c.CustomerID, c.Name
FROM Customer c
JOIN CustomerPhone p ON p.CustomerID = c.CustomerID
WHERE p.PhoneNumber = '555-0102';

2) Mixing Multiple Concepts into One Table (Overloaded Entities)

What the mistake looks like: one table tries to represent multiple different “things” or lifecycle stages, often with many nullable columns and a “Type” field that changes what the row means.

Flawed mini-schema

Party(
  PartyID PK,
  PartyType,          -- 'PERSON' or 'COMPANY'
  FirstName,
  LastName,
  DateOfBirth,
  CompanyName,
  TaxID,
  ContactEmail
)

Problems it creates

Ambiguous semantics: which columns are valid depends on PartyType; the table does not communicate the rules clearly.
Lots of NULLs: most rows leave half the columns empty; it becomes hard to tell “unknown” from “not applicable”.
Weak validation: you cannot easily enforce “CompanyName required for companies” and “DateOfBirth required for people” without complex logic.
Hard queries and reporting: every query needs conditional logic; indexes and constraints are less effective.

Refactor: step-by-step to a clean design

Step 1: List the distinct concepts. Here: Person and Company are different entities with different attributes.

Step 2: Extract shared attributes. If both share contact email, keep it in a shared table or a separate ContactPoint table.

Step 3: Split into subtype tables. Keep a stable identifier in a common table if you need a single “Party” reference; otherwise, separate completely.

Corrected design (common supertype + subtypes)

Party( PartyID PK, PartyKind )  -- 'PERSON' or 'COMPANY'

Person(
  PartyID PK/FK,
  FirstName,
  LastName,
  DateOfBirth
)

Company(
  PartyID PK/FK,
  CompanyName,
  TaxID
)

PartyEmail(
  PartyID FK,
  Email,
  IsPrimary,
  PRIMARY KEY (PartyID, Email)
)

How this fixes the issues: each table has a single meaning, required fields are clearer, and queries can target the correct subtype without scanning irrelevant columns.

3) Storing Derived or Calculated Values Without a Clear Need

What the mistake looks like: storing values that can be computed from other stored facts (totals, counts, ages, balances) without a plan for keeping them correct.

Flawed mini-schema

OrderHeader(
  OrderID PK,
  CustomerID,
  OrderDate,
  TotalAmount  -- derived from line items
)

OrderLine(
  OrderID FK,
  LineNo,
  ProductID,
  Qty,
  UnitPrice,
  PRIMARY KEY (OrderID, LineNo)
)

Problems it creates

Inconsistency risk: TotalAmount can drift from the sum of lines after edits, returns, discounts, or bug fixes.
Update anomalies: any change to lines requires remembering to update the header total; missed updates create silent data corruption.
Unclear meaning: is TotalAmount before tax, after tax, after discount, rounded how, in which currency?

Refactor: step-by-step decision process

Step 1: Ask “Is it derivable?” If yes, prefer deriving it in queries or views.

Step 2: If you must store it, define why. Common valid reasons: performance at scale, legal/audit snapshot, integration requirements.

Step 3: If stored, store the components or the rule context. For example, store tax rate used, discount applied, currency, and a “snapshot” intent.

Step 4: Make the stored value verifiable. Keep enough detail to recompute and reconcile.

Corrected design (derive totals; store explicit pricing facts)

OrderHeader(
  OrderID PK,
  CustomerID,
  OrderDate,
  CurrencyCode
)

OrderLine(
  OrderID FK,
  LineNo,
  ProductID,
  Qty,
  UnitPrice,
  DiscountAmount,   -- explicit fact if applicable
  TaxAmount,        -- explicit fact if applicable
  PRIMARY KEY (OrderID, LineNo)
)

Compute total when needed:

SELECT ol.OrderID,
       SUM(ol.Qty * ol.UnitPrice - ol.DiscountAmount + ol.TaxAmount) AS TotalAmount
FROM OrderLine ol
WHERE ol.OrderID = 1001
GROUP BY ol.OrderID;

When storing a snapshot is required: rename to reflect intent (for example, TotalAmountSnapshot) and document the rule (what it includes). The key is that the column name and surrounding fields make the semantics unambiguous and auditable.

4) EAV/“Attribute Table” Misuse

What the mistake looks like: using a generic “Entity-Attribute-Value” structure for ordinary, well-known attributes, often to avoid schema changes.

Flawed mini-schema

Product( ProductID PK, Name )

ProductAttribute(
  ProductID FK,
  AttributeName,   -- e.g., 'color','weight','battery_life'
  AttributeValue   -- stored as text
)

Problems it creates

No data types: numbers, dates, and booleans become strings; comparisons and sorting become error-prone.
No constraints: you cannot easily enforce required attributes, allowed ranges, or uniqueness (e.g., SKU) in a reliable way.
Hard queries: filtering by multiple attributes requires multiple self-joins or pivot logic; indexes are less effective.
Unclear semantics: spelling variations (“Colour” vs “Color”) create duplicate meanings; units are inconsistent (“kg” vs “lbs”).

Refactor: step-by-step to a clean design

Step 1: Separate “core attributes” from “truly extensible attributes”. If an attribute is common, important, and queried often, it belongs as a typed column (or a related table) in the normal schema.

Step 2: Model structured variation explicitly. If products have categories with different sets of attributes, consider category-specific tables or a controlled attribute definition system with types.

Step 3: If you keep an extensible attribute system, add governance. Use an AttributeDefinition table with data type, allowed values, and units; store typed values rather than a single text field.

Corrected design (core columns + controlled extensible attributes)

Product(
  ProductID PK,
  Name,
  Color,
  WeightKg
)

AttributeDefinition(
  AttributeID PK,
  AttributeCode,     -- e.g., 'battery_life_hours'
  DataType,          -- 'INT','DECIMAL','TEXT','DATE'
  UnitCode           -- optional
)

ProductAttributeValue(
  ProductID FK,
  AttributeID FK,
  ValueInt,
  ValueDecimal,
  ValueText,
  ValueDate,
  PRIMARY KEY (ProductID, AttributeID)
)

Practical querying becomes predictable: core attributes are simple filters, and extensible attributes are still possible without losing typing and definitions.

5) Inconsistent Naming and Unclear Semantics

What the mistake looks like: names that don’t clearly convey meaning, mixed conventions, overloaded column names, and “mystery flags” (Status = 1/2/3 without definitions).

Flawed mini-schema

tblCust( id PK, nm, addr, status )

Orders( id PK, cust, dt, total )

OrderItems( order_id, product, qty )

Problems it creates

Hard to read and maintain: developers guess what nm/addr/dt mean; mistakes propagate into code and reports.
Inconsistent joins: cust vs customer_id vs id leads to wrong joins and subtle bugs.
Unclear semantics: status could mean active/inactive, credit hold, lifecycle stage, or something else; total could be derived or snapshot.
Poor discoverability: schema exploration and onboarding become slow; queries take longer to write correctly.

Refactor: step-by-step naming cleanup

Step 1: Choose a consistent convention. For example: singular table names, PascalCase or snake_case, and consistent key names (CustomerID, OrderID).

Step 2: Rename columns to express meaning. dt becomes OrderDate; addr becomes BillingAddressLine1 (or a separate Address table if needed).

Step 3: Replace magic codes with explicit structures. Use a lookup table (or at least a constrained set) for statuses with clear labels.

Step 4: Align foreign key names with referenced keys. CustomerID in Orders should reference Customer.CustomerID.

Corrected design

Customer(
  CustomerID PK,
  CustomerName,
  CustomerStatusCode
)

CustomerStatus(
  CustomerStatusCode PK,
  StatusName
)

OrderHeader(
  OrderID PK,
  CustomerID FK,
  OrderDate
)

OrderLine(
  OrderID FK,
  LineNo,
  ProductID,
  Quantity,
  PRIMARY KEY (OrderID, LineNo)
)

Result: the schema communicates intent directly; queries become easier to write correctly; and “what does this mean?” questions drop dramatically.

Self-Review Checklist for Any Schema Draft

Repeating groups: Do any columns look like X1/X2/X3, or contain lists (comma-separated, JSON blobs) that should be rows in a related table?
Single concept per table: Does each table represent one clear thing? If a “Type” column changes which fields apply, should you split into subtypes?
Derived values: Are you storing totals, counts, ages, or balances? If yes, is there a clear reason, clear naming (snapshot vs computed), and a way to reconcile?
EAV temptation: Are you using AttributeName/AttributeValue for attributes that are known, common, and queried? If yes, move them to typed columns or add a controlled, typed attribute definition system.
Naming consistency: Are table/column names consistent, descriptive, and aligned (CustomerID everywhere)? Avoid abbreviations that require tribal knowledge.
Semantic clarity: For every status/flag/code, can someone tell what values mean without reading application code? If not, add a lookup/definition table or clearer naming.
Query sanity check: Can you write the top 5 expected queries without awkward string parsing, many OR conditions, or multiple self-joins?

Now answer the exercise about the content:

A customer can have multiple phone numbers. Which schema change best avoids repeating columns and makes queries like “find customer by phone number” straightforward?

You are right! Congratulations, now go to the next page

You missed! Try again.

Phone numbers are a repeating group. Storing each phone as a row in a related table avoids update anomalies, removes artificial limits, and enables simple joins and reliable indexing when searching by phone.