All courses > Technology and Programming > Databases ::

Entities vs. Attributes: Choosing What Becomes a Table

Capítulo 2

Estimated reading time: 7 minutes

1) Definitions and Practical Heuristics

In database design, a common decision is whether a concept should be modeled as an entity (its own table) or as an attribute (a column on another table). The goal is to store facts in a way that stays consistent as the system grows.

Entity (table)

An entity is something you store multiple facts about, often across time, and that can be referenced independently. It typically has its own primary key and may be related to other entities.

Attribute (column)

An attribute is a single fact about an entity instance. It is stored as a column on the entity’s table and is usually not referenced on its own by other tables.

Heuristics: When it should be an entity

Has its own lifecycle: It can be created, updated, deactivated, merged, or deleted independently of the parent. If you can imagine a screen or API endpoint dedicated to managing it, it often deserves a table.
Many instances per parent: If one parent record can have many of these, it often becomes a separate table (one-to-many). This avoids repeating columns or storing lists in a single column.
Referenced independently: Other tables need to point to it. If multiple different entities need to reference the same concept, a table with a key is usually appropriate.
Needs history or auditing: If you must track changes over time (effective dates, who changed it, previous values), a separate table (or a history table) is often required.

Heuristics: When it can stay an attribute

Single-valued and stable: One value per entity, and it rarely changes (or you don’t need to track changes).
Not referenced elsewhere: No other table needs a foreign key to it.
Simple domain: A small, fixed set of values can be stored as a constrained column (for example, a CHECK constraint or an enum-like approach, depending on your database).
No additional facts: You don’t need to store extra properties about it. If you start adding properties, it is likely becoming an entity.

Step-by-step decision process

Step 1: Identify the “thing” and the “owner” (e.g., “address” belongs to a customer).
Step 2: Ask if it can occur multiple times per owner (multiple addresses? multiple contact methods?). If yes, lean toward an entity.
Step 3: Ask if it needs its own identity (do you need to refer to “that specific address” from orders, shipments, invoices?). If yes, entity.
Step 4: Ask if it has its own properties beyond a single value (e.g., contact method has type, value, verified flag, preferred flag). If yes, entity.
Step 5: Ask if you need history (status changes over time, address validity periods). If yes, entity or a related history table.
Step 6: If none apply, model as an attribute and enforce validity with constraints where possible.

2) Contrasting Examples: Entity vs. Attribute

Example A: Address

As an attribute (column approach): Works only when you are sure there is exactly one address, you don’t reference it elsewhere, and you don’t need multiple types (billing/shipping) or history.

Customer(CustomerID, FullName, AddressLine1, AddressLine2, City, Region, PostalCode, Country)

As an entity (table approach): Prefer this when customers can have multiple addresses, addresses have types, you need to mark one as primary, or orders need to reference a specific address used at purchase time.

Continue in our app.

Listen to the audio with the screen off.
Earn a certificate upon completion.
Over 5000 courses for you to explore!

Or continue reading below...

Download the app

Customer(CustomerID, FullName, ...)

Address(AddressID, CustomerID, AddressType, Line1, Line2, City, Region, PostalCode, Country, IsPrimary, ValidFrom, ValidTo)

Decision cues: multiple per customer, referenced by orders/shipments, needs validity dates → entity.

Example B: Category

As an attribute: If category is a simple label and you will never store anything else about it, you might store it directly on the product.

Product(ProductID, Name, Category)

This is fast to implement but risks inconsistent spelling and makes renaming categories harder (you must update many rows).

As an entity: Prefer this when categories are reused, managed centrally, need additional properties (display order, parent category), or are referenced by reporting and filtering.

Category(CategoryID, CategoryName, ParentCategoryID, DisplayOrder, IsActive)

Product(ProductID, Name, CategoryID)

Decision cues: referenced independently, needs management screen, has extra properties → entity.

Example C: Status

Status is tricky because it often has both a current value and a history of changes.

As an attribute (current status only): If you only care about the current status and do not need to know when it changed.

Order(OrderID, CustomerID, OrderDate, Status)

As an entity (status history): If you must track every change, who changed it, and timestamps.

Order(OrderID, CustomerID, OrderDate, CurrentStatus)

OrderStatusHistory(OrderID, Status, ChangedAt, ChangedByUserID, Note)

You may also normalize the status values themselves if they have metadata (allowed transitions, display labels):

Status(StatusCode, DisplayName, IsTerminal)

OrderStatusHistory(OrderID, StatusCode, ChangedAt, ChangedByUserID)

Decision cues: needs history/audit → separate table; status values have properties → status lookup table.

Example D: Contact Method

As attributes: Works only when you are certain there will be exactly one email and one phone number, and you do not need verification, preference, or multiple entries.

Customer(CustomerID, FullName, Email, Phone)

As an entity: Prefer this when customers can have multiple contact methods, different types, verification states, or a preferred method.

Customer(CustomerID, FullName, ...)

ContactMethod(ContactMethodID, CustomerID, Type, Value, IsPreferred, IsVerified, VerifiedAt)

Decision cues: many instances per customer, additional facts (verified/preferred), lifecycle (verify/unverify) → entity.

3) Multi-valued and Composite Attributes

Multi-valued attributes (e.g., multiple phone numbers)

A multi-valued attribute is a property where one entity can have multiple values (0..n). Storing multiple values in one column (comma-separated lists, JSON blobs, “Phone1/Phone2/Phone3”) makes searching, validation, uniqueness, and constraints harder.

Recommended modeling: separate table

When a customer can have multiple phone numbers, model phone numbers as rows in a related table.

Customer(CustomerID, FullName, ...)

CustomerPhone(PhoneID, CustomerID, PhoneNumber, PhoneType, IsPrimary)

Step-by-step guidance:

Step 1: Confirm multiplicity (can there be more than one?). If yes, do not store as a single column.
Step 2: List extra facts you need (type, primary flag, verification, extension). If any exist, a table is clearly needed.
Step 3: Choose a key strategy (PhoneID as surrogate key, or a composite key like (CustomerID, PhoneNumber) if uniqueness rules allow).
Step 4: Add constraints (e.g., one primary per customer, phone format rules where feasible).

When a multi-valued attribute can stay as an attribute

Keep it as a single column only if it is truly single-valued and you do not need to query individual values. If you ever need to filter by a specific phone number, enforce uniqueness, or join on it, it should not be stored as a list.

Composite attributes (e.g., name parts)

A composite attribute is a value that can be broken into smaller parts (for example, a person’s name into given name and family name). Whether to split depends on how you use the data.

When to split into multiple columns

You need to search/sort by parts: sorting by last name, filtering by first name.
You need consistent formatting: generating letters, labels, or legal documents.
You need validation rules per part: middle initial, suffix, title.

Person(PersonID, GivenName, MiddleName, FamilyName, Suffix)

When to keep as one column

You only display it as entered: no reliable parsing required.
Names are culturally variable: rigid splitting can be incorrect for many naming conventions.

Person(PersonID, DisplayName)

A practical compromise is to store both: a display name as entered plus optional structured parts when needed.

Person(PersonID, DisplayName, GivenName, FamilyName)

Rule of thumb for splitting

Split when you need to operate on the parts.
Do not split just because you can. Splitting creates more fields to maintain and more edge cases.

Decision Worksheet (Fill-in Template)

Use this worksheet for each candidate concept you identify during modeling. The goal is to justify the choice and make the expected stored facts explicit.

Worksheet: Entity vs. Attribute Decision

Candidate concept name: __________________________
Where does it belong (parent entity)? __________________________
Proposed modeling choice: Entity (table) / Attribute (column)
Justification (check all that apply):
- Has its own lifecycle: Yes / No (explain: ____________________)
- Many instances per parent: Yes / No (expected max count: ____)
- Referenced independently by other tables: Yes / No (by which: __________)
- Needs history/audit: Yes / No (what history: ____________________)
- Has additional properties beyond a single value: Yes / No (list: __________)
If modeled as an attribute:
- Table: ____________________
- Column name: ____________________
- Data type: ____________________
- Constraints (allowed values, uniqueness, nullability): ____________________
If modeled as an entity:
- Table name: ____________________
- Primary key: ____________________
- Foreign keys (what it references): ____________________
- Expected columns to store (list):
  - ____________________
  - ____________________
  - ____________________
- Cardinality with parent (1:1, 1:N, N:M): ____________________

Now answer the exercise about the content:

In a customer database, which situation most strongly indicates that “Address” should be modeled as its own entity (table) rather than as columns on Customer?

You are right! Congratulations, now go to the next page

You missed! Try again.

Multiple addresses per customer and independent references (e.g., from orders/shipments) are key cues that the concept needs its own identity and should be a separate table.