Free Ebook cover Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

Python Data Modeling in Practice: Dataclasses, Pydantic, and Type Hints

New course

14 pages

Value Objects, Entities, and Identity Semantics

Capítulo 5

Estimated reading time: 12 minutes

+ Exercise

Why Identity Semantics Matter in Python Models

In domain modeling, two objects can look the same and still mean different things. The difference usually comes down to identity: whether an object is defined by its attributes (a value) or by a persistent identity (an entity). Getting this distinction right affects equality checks, hashing, caching, collections behavior, persistence, and how you reason about changes over time.

In Python, identity semantics show up immediately in code: == vs is, whether an object can be used as a dictionary key, whether it should be mutable, and what it means to “update” it. This chapter focuses on practical patterns for implementing value objects and entities, and on avoiding subtle bugs caused by mixing the two.

Value Objects: Defined by Their Values

A value object is defined entirely by its attributes. If two instances have the same attributes, they represent the same conceptual value. Typical examples include money amounts, email addresses, coordinates, date ranges, and product dimensions.

Core properties of value objects

  • Equality by value: two instances with the same data compare equal.
  • Immutability (recommended): once created, the value does not change; changes are represented by creating a new instance.
  • Safe to use as keys: if immutable and hashable, they can be used in sets and dict keys.
  • No lifecycle: they do not have a “before/after” identity across time; they are just values.

Implementing a value object with dataclasses

For value objects, you typically want frozen=True so the object is immutable and hashable, and you want equality to be structural (the default for dataclasses).

from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class EmailAddress:
    value: str
    def normalized(self) -> str:
        return self.value.strip().lower()
    def __post_init__(self) -> None:
        # Keep validation minimal here; assume validation patterns were covered earlier.
        if "@" not in self.value:
            raise ValueError("Invalid email address")

Even though EmailAddress stores the original string, you may want equality to use a normalized form. That is a domain decision: do you consider John@Example.com and john@example.com the same value? If yes, you should encode that in equality and hashing.

Continue in our app.

You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.

Or continue reading below...
Download App

Download the app

Custom equality and hashing for normalized values

When equality should be based on a derived representation, override __eq__ and __hash__. With frozen=True, hashing is safe as long as the derived representation is stable.

@dataclass(frozen=True, slots=True)
class EmailAddress:
    value: str
    def normalized(self) -> str:
        return self.value.strip().lower()
    def __eq__(self, other: object) -> bool:
        if not isinstance(other, EmailAddress):
            return NotImplemented
        return self.normalized() == other.normalized()
    def __hash__(self) -> int:
        return hash(self.normalized())

This is a powerful technique, but use it deliberately: custom equality can surprise readers if it differs from field-by-field comparison. Document the intent in code comments and tests.

Step-by-step: modeling Money as a value object

Money is a classic value object. The key modeling choice is that money is not “just a float”: it has currency, rounding rules, and arithmetic constraints.

Step 1: choose a representation. Use Decimal for amounts and a currency code string (or an enum).

from dataclasses import dataclass
from decimal import Decimal
@dataclass(frozen=True, slots=True)
class Money:
    amount: Decimal
    currency: str

Step 2: define value semantics. Two monies are equal if both amount and currency match.

Step 3: encode domain operations. Provide methods that return new instances rather than mutating.

    def add(self, other: "Money") -> "Money":
        if self.currency != other.currency:
            raise ValueError("Currency mismatch")
        return Money(self.amount + other.amount, self.currency)
    def multiply(self, factor: Decimal) -> "Money":
        return Money(self.amount * factor, self.currency)

Step 4: ensure hashability. With frozen=True, the dataclass is hashable if all fields are hashable (Decimal and str are hashable). That allows:

prices = {Money(Decimal("9.99"), "USD"): "standard"}

Step 5: avoid leaking primitives. Prefer passing Money through your domain rather than passing Decimal and currency separately. This reduces parameter lists and prevents mixing currencies accidentally.

Entities: Defined by Identity Over Time

An entity is defined by a stable identity that persists across changes to its attributes. Two entities can have the same attributes and still be different entities if they have different identities. Conversely, the same entity can change attributes over time while remaining “the same thing.”

Core properties of entities

  • Equality by identity: two instances represent the same entity if their identifiers match.
  • Lifecycle: entities are created, modified, and possibly deleted; they have history.
  • Often mutable: attributes can change while identity remains stable.
  • Careful hashing: mutable entities should generally not be hashable, or hashing must be based only on immutable identity.

Entity IDs as explicit types

In Python, it is tempting to use raw strings or integers for IDs. A practical improvement is to wrap IDs in small value objects so you cannot accidentally pass a CustomerId where an OrderId is expected. This is especially helpful in larger codebases.

from dataclasses import dataclass
from uuid import UUID
@dataclass(frozen=True, slots=True)
class CustomerId:
    value: UUID
@dataclass(frozen=True, slots=True)
class OrderId:
    value: UUID

These ID types are value objects: immutable, hashable, and comparable by value.

Implementing an entity with identity-based equality

For entities, you typically want equality to be based on the ID only. With dataclasses, you can set eq=False and implement __eq__ yourself. You also need to decide whether the entity should be hashable. If the entity is mutable, the safest default is to make it unhashable.

from dataclasses import dataclass, field
@dataclass(slots=True, eq=False)
class Customer:
    id: CustomerId
    email: EmailAddress
    name: str
    is_active: bool = True
    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Customer):
            return NotImplemented
        return self.id == other.id

Now, two Customer objects with the same CustomerId compare equal even if their other attributes differ. That matches the entity concept: they are two references to the same underlying entity.

Hashing entities: three practical options

Hashing determines whether an object can be used as a key in a dict or stored in a set. For entities, choose one of these options explicitly.

  • Option A: unhashable (recommended for mutable entities). Do nothing; Python will make it unhashable if you define __eq__ without __hash__.
  • Option B: hash by identity only. Implement __hash__ using the ID. This is safe only if the ID never changes.
  • Option C: avoid using entities as keys. Use the ID value object as the key instead, which is often the cleanest approach.

Option C tends to produce fewer surprises. For example:

customers_by_id: dict[CustomerId, Customer] = {}
customers_by_id[customer.id] = customer

This keeps hashing on an immutable value object and avoids issues if the entity’s other fields change.

Identity Semantics in Collections and Caches

Identity semantics become visible when you store objects in collections, especially sets and dicts. A common bug is to treat an entity like a value object and put it into a set, then mutate it in a way that affects equality or hashing. Another common bug is to treat a value object like an entity and compare by object identity (is) rather than by value.

Step-by-step: avoiding the “mutable key” bug

Step 1: recognize the risk. If an object’s hash can change while it is in a dict or set, lookups can break.

Step 2: make the risky object unhashable. For entities, prefer not implementing __hash__.

Step 3: use stable keys. Use immutable IDs or value objects as keys.

active_customers: set[CustomerId] = set()
active_customers.add(customer.id)

Step 4: store the entity elsewhere. Keep a mapping from ID to entity if needed.

customers: dict[CustomerId, Customer] = {customer.id: customer}

When to Choose a Value Object vs an Entity

In practice, you decide based on how the domain talks about the concept.

  • Choose a value object when the concept is interchangeable if its attributes match. If you replace one instance with another equal instance, nothing changes.
  • Choose an entity when the concept must be tracked across time, referenced from multiple places, or updated while remaining “the same thing.”

A helpful test is: if you copy all fields into a new instance, is it the same conceptual thing? For value objects, yes. For entities, no, unless the identity is also copied and considered the same.

Bridging the Two: Entities Composed of Value Objects

A common and effective pattern is to make entities out of value objects. The entity holds an ID and uses value objects for its attributes. This gives you strong local correctness (value objects are self-contained and comparable) while still modeling lifecycle and identity at the entity level.

@dataclass(slots=True, eq=False)
class Order:
    id: OrderId
    customer_id: CustomerId
    total: Money
    status: str
    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Order):
            return NotImplemented
        return self.id == other.id

Here, Money is immutable and safe to compare by value, while Order is compared by identity.

Identity vs Object Identity: is Is Not Your Domain

Python’s is checks whether two references point to the same object in memory. Domain identity is usually not the same as object identity. Two separate instances loaded from a database can represent the same entity; they should compare equal by ID even though a is b is false.

Use is for singleton checks (None) and for cases where object identity is intentionally meaningful (rare in domain models). Use == for domain equality, and implement == to match your value/entity semantics.

Practical Pattern: Identity Map for Entities

When you load entities from persistence, you may want to ensure that within a unit of work you only have one in-memory instance per entity ID. This prevents conflicting updates and makes object identity align with domain identity within that scope. This is often called an identity map.

Step-by-step: a simple identity map

Step 1: create a registry keyed by ID.

class IdentityMap:
    def __init__(self) -> None:
        self._customers: dict[CustomerId, Customer] = {}
    def get_customer(self, customer_id: CustomerId) -> Customer | None:
        return self._customers.get(customer_id)
    def add_customer(self, customer: Customer) -> None:
        self._customers[customer.id] = customer

Step 2: consult the map when loading. If the entity is already present, return the existing instance; otherwise, create it and register it.

def load_customer(customer_id: CustomerId, imap: IdentityMap) -> Customer:
    existing = imap.get_customer(customer_id)
    if existing is not None:
        return existing
    row = {"id": customer_id, "email": EmailAddress("a@b.com"), "name": "A"}
    customer = Customer(id=row["id"], email=row["email"], name=row["name"])
    imap.add_customer(customer)
    return customer

This pattern is especially useful when multiple aggregates reference the same entity (for example, many orders referencing the same customer) and you want consistent in-memory updates.

Identity Semantics with Pydantic Models

Pydantic is often used at system boundaries: parsing input, validating API payloads, and serializing output. The important modeling decision is whether a Pydantic model represents a value object (value semantics) or a DTO for an entity (identity semantics).

Value-object-like Pydantic models

If a Pydantic model represents a value object, you typically want it to be immutable and comparable by value. In Pydantic v2, you can configure immutability via frozen in model_config.

from pydantic import BaseModel, ConfigDict
class MoneyModel(BaseModel):
    model_config = ConfigDict(frozen=True)
    amount: str
    currency: str

This is useful for boundary validation, but you may still want a separate domain Money value object using Decimal. A common approach is to parse into the Pydantic model, then convert into the domain type.

Entity-like Pydantic models

If a Pydantic model represents an entity payload, it often includes an id field and other mutable fields. Equality semantics are less important at the boundary; what matters is that you can reliably extract the ID and map it to your domain entity.

class CustomerPayload(BaseModel):
    id: str
    email: str
    name: str

In this case, treat it as a transport shape. Convert it into your domain entity or command object, and let the domain decide identity semantics.

Common Pitfalls and How to Avoid Them

Pitfall: using a mutable dataclass as a value object

If you model a value object as mutable, you can accidentally change it after it has been used as a key or shared across objects. Prefer frozen=True for value objects, and represent changes by creating new instances.

Pitfall: comparing entities by all fields

If you let dataclasses generate __eq__ for an entity, it will compare all fields. That can cause two references to the same entity (same ID) to compare unequal if one instance is stale. For entities, implement equality based on ID.

Pitfall: putting entities into sets

Even if you hash by ID, sets of entities can be confusing because membership is based on equality and hashing. Prefer sets of IDs, or keep entities in dicts keyed by ID.

Pitfall: treating IDs as primitives everywhere

Primitive IDs make it easy to mix up identifiers across concepts. Wrapping IDs in small value objects improves correctness and readability, and it makes function signatures self-documenting.

Checklist: Encoding Identity Semantics in Your Code

  • Value object: immutable, equality by value, hashable, safe as dict key.
  • Entity: equality by ID, usually unhashable (or hash by immutable ID only), mutable fields allowed.
  • ID type: model as a value object to avoid mixing IDs.
  • Collections: prefer dicts keyed by ID for entities; prefer sets/dicts of value objects for values.
  • Boundary models: treat Pydantic models as transport shapes; convert to domain types with explicit semantics.

Now answer the exercise about the content:

Which approach best avoids collection bugs when tracking a group of mutable entities like customers?

You are right! Congratulations, now go to the next page

You missed! Try again.

Mutable entities can change after being placed in sets or used as dict keys, which can break lookups. Using immutable ID value objects as keys and keeping entities in a dict by ID keeps hashing stable and matches identity-based equality.

Next chapter

Immutability, Freezing, and Controlled Mutation

Arrow Right Icon
Download the app to earn free Certification and listen to the courses in the background, even with the screen off.