Why Identity Semantics Matter in Python Models
In domain modeling, two objects can look the same and still mean different things. The difference usually comes down to identity: whether an object is defined by its attributes (a value) or by a persistent identity (an entity). Getting this distinction right affects equality checks, hashing, caching, collections behavior, persistence, and how you reason about changes over time.
In Python, identity semantics show up immediately in code: == vs is, whether an object can be used as a dictionary key, whether it should be mutable, and what it means to “update” it. This chapter focuses on practical patterns for implementing value objects and entities, and on avoiding subtle bugs caused by mixing the two.
Value Objects: Defined by Their Values
A value object is defined entirely by its attributes. If two instances have the same attributes, they represent the same conceptual value. Typical examples include money amounts, email addresses, coordinates, date ranges, and product dimensions.
Core properties of value objects
- Equality by value: two instances with the same data compare equal.
- Immutability (recommended): once created, the value does not change; changes are represented by creating a new instance.
- Safe to use as keys: if immutable and hashable, they can be used in sets and dict keys.
- No lifecycle: they do not have a “before/after” identity across time; they are just values.
Implementing a value object with dataclasses
For value objects, you typically want frozen=True so the object is immutable and hashable, and you want equality to be structural (the default for dataclasses).
from dataclasses import dataclass@dataclass(frozen=True, slots=True)class EmailAddress: value: str def normalized(self) -> str: return self.value.strip().lower() def __post_init__(self) -> None: # Keep validation minimal here; assume validation patterns were covered earlier. if "@" not in self.value: raise ValueError("Invalid email address")Even though EmailAddress stores the original string, you may want equality to use a normalized form. That is a domain decision: do you consider John@Example.com and john@example.com the same value? If yes, you should encode that in equality and hashing.
Continue in our app.
You can listen to the audiobook with the screen off, receive a free certificate for this course, and also have access to 5,000 other free online courses.
Or continue reading below...Download the app
Custom equality and hashing for normalized values
When equality should be based on a derived representation, override __eq__ and __hash__. With frozen=True, hashing is safe as long as the derived representation is stable.
@dataclass(frozen=True, slots=True)class EmailAddress: value: str def normalized(self) -> str: return self.value.strip().lower() def __eq__(self, other: object) -> bool: if not isinstance(other, EmailAddress): return NotImplemented return self.normalized() == other.normalized() def __hash__(self) -> int: return hash(self.normalized())This is a powerful technique, but use it deliberately: custom equality can surprise readers if it differs from field-by-field comparison. Document the intent in code comments and tests.
Step-by-step: modeling Money as a value object
Money is a classic value object. The key modeling choice is that money is not “just a float”: it has currency, rounding rules, and arithmetic constraints.
Step 1: choose a representation. Use Decimal for amounts and a currency code string (or an enum).
from dataclasses import dataclassfrom decimal import Decimal@dataclass(frozen=True, slots=True)class Money: amount: Decimal currency: strStep 2: define value semantics. Two monies are equal if both amount and currency match.
Step 3: encode domain operations. Provide methods that return new instances rather than mutating.
def add(self, other: "Money") -> "Money": if self.currency != other.currency: raise ValueError("Currency mismatch") return Money(self.amount + other.amount, self.currency) def multiply(self, factor: Decimal) -> "Money": return Money(self.amount * factor, self.currency)Step 4: ensure hashability. With frozen=True, the dataclass is hashable if all fields are hashable (Decimal and str are hashable). That allows:
prices = {Money(Decimal("9.99"), "USD"): "standard"}Step 5: avoid leaking primitives. Prefer passing Money through your domain rather than passing Decimal and currency separately. This reduces parameter lists and prevents mixing currencies accidentally.
Entities: Defined by Identity Over Time
An entity is defined by a stable identity that persists across changes to its attributes. Two entities can have the same attributes and still be different entities if they have different identities. Conversely, the same entity can change attributes over time while remaining “the same thing.”
Core properties of entities
- Equality by identity: two instances represent the same entity if their identifiers match.
- Lifecycle: entities are created, modified, and possibly deleted; they have history.
- Often mutable: attributes can change while identity remains stable.
- Careful hashing: mutable entities should generally not be hashable, or hashing must be based only on immutable identity.
Entity IDs as explicit types
In Python, it is tempting to use raw strings or integers for IDs. A practical improvement is to wrap IDs in small value objects so you cannot accidentally pass a CustomerId where an OrderId is expected. This is especially helpful in larger codebases.
from dataclasses import dataclassfrom uuid import UUID@dataclass(frozen=True, slots=True)class CustomerId: value: UUID@dataclass(frozen=True, slots=True)class OrderId: value: UUIDThese ID types are value objects: immutable, hashable, and comparable by value.
Implementing an entity with identity-based equality
For entities, you typically want equality to be based on the ID only. With dataclasses, you can set eq=False and implement __eq__ yourself. You also need to decide whether the entity should be hashable. If the entity is mutable, the safest default is to make it unhashable.
from dataclasses import dataclass, field@dataclass(slots=True, eq=False)class Customer: id: CustomerId email: EmailAddress name: str is_active: bool = True def __eq__(self, other: object) -> bool: if not isinstance(other, Customer): return NotImplemented return self.id == other.idNow, two Customer objects with the same CustomerId compare equal even if their other attributes differ. That matches the entity concept: they are two references to the same underlying entity.
Hashing entities: three practical options
Hashing determines whether an object can be used as a key in a dict or stored in a set. For entities, choose one of these options explicitly.
- Option A: unhashable (recommended for mutable entities). Do nothing; Python will make it unhashable if you define
__eq__without__hash__. - Option B: hash by identity only. Implement
__hash__using the ID. This is safe only if the ID never changes. - Option C: avoid using entities as keys. Use the ID value object as the key instead, which is often the cleanest approach.
Option C tends to produce fewer surprises. For example:
customers_by_id: dict[CustomerId, Customer] = {}customers_by_id[customer.id] = customerThis keeps hashing on an immutable value object and avoids issues if the entity’s other fields change.
Identity Semantics in Collections and Caches
Identity semantics become visible when you store objects in collections, especially sets and dicts. A common bug is to treat an entity like a value object and put it into a set, then mutate it in a way that affects equality or hashing. Another common bug is to treat a value object like an entity and compare by object identity (is) rather than by value.
Step-by-step: avoiding the “mutable key” bug
Step 1: recognize the risk. If an object’s hash can change while it is in a dict or set, lookups can break.
Step 2: make the risky object unhashable. For entities, prefer not implementing __hash__.
Step 3: use stable keys. Use immutable IDs or value objects as keys.
active_customers: set[CustomerId] = set()active_customers.add(customer.id)Step 4: store the entity elsewhere. Keep a mapping from ID to entity if needed.
customers: dict[CustomerId, Customer] = {customer.id: customer}When to Choose a Value Object vs an Entity
In practice, you decide based on how the domain talks about the concept.
- Choose a value object when the concept is interchangeable if its attributes match. If you replace one instance with another equal instance, nothing changes.
- Choose an entity when the concept must be tracked across time, referenced from multiple places, or updated while remaining “the same thing.”
A helpful test is: if you copy all fields into a new instance, is it the same conceptual thing? For value objects, yes. For entities, no, unless the identity is also copied and considered the same.
Bridging the Two: Entities Composed of Value Objects
A common and effective pattern is to make entities out of value objects. The entity holds an ID and uses value objects for its attributes. This gives you strong local correctness (value objects are self-contained and comparable) while still modeling lifecycle and identity at the entity level.
@dataclass(slots=True, eq=False)class Order: id: OrderId customer_id: CustomerId total: Money status: str def __eq__(self, other: object) -> bool: if not isinstance(other, Order): return NotImplemented return self.id == other.idHere, Money is immutable and safe to compare by value, while Order is compared by identity.
Identity vs Object Identity: is Is Not Your Domain
Python’s is checks whether two references point to the same object in memory. Domain identity is usually not the same as object identity. Two separate instances loaded from a database can represent the same entity; they should compare equal by ID even though a is b is false.
Use is for singleton checks (None) and for cases where object identity is intentionally meaningful (rare in domain models). Use == for domain equality, and implement == to match your value/entity semantics.
Practical Pattern: Identity Map for Entities
When you load entities from persistence, you may want to ensure that within a unit of work you only have one in-memory instance per entity ID. This prevents conflicting updates and makes object identity align with domain identity within that scope. This is often called an identity map.
Step-by-step: a simple identity map
Step 1: create a registry keyed by ID.
class IdentityMap: def __init__(self) -> None: self._customers: dict[CustomerId, Customer] = {} def get_customer(self, customer_id: CustomerId) -> Customer | None: return self._customers.get(customer_id) def add_customer(self, customer: Customer) -> None: self._customers[customer.id] = customerStep 2: consult the map when loading. If the entity is already present, return the existing instance; otherwise, create it and register it.
def load_customer(customer_id: CustomerId, imap: IdentityMap) -> Customer: existing = imap.get_customer(customer_id) if existing is not None: return existing row = {"id": customer_id, "email": EmailAddress("a@b.com"), "name": "A"} customer = Customer(id=row["id"], email=row["email"], name=row["name"]) imap.add_customer(customer) return customerThis pattern is especially useful when multiple aggregates reference the same entity (for example, many orders referencing the same customer) and you want consistent in-memory updates.
Identity Semantics with Pydantic Models
Pydantic is often used at system boundaries: parsing input, validating API payloads, and serializing output. The important modeling decision is whether a Pydantic model represents a value object (value semantics) or a DTO for an entity (identity semantics).
Value-object-like Pydantic models
If a Pydantic model represents a value object, you typically want it to be immutable and comparable by value. In Pydantic v2, you can configure immutability via frozen in model_config.
from pydantic import BaseModel, ConfigDictclass MoneyModel(BaseModel): model_config = ConfigDict(frozen=True) amount: str currency: strThis is useful for boundary validation, but you may still want a separate domain Money value object using Decimal. A common approach is to parse into the Pydantic model, then convert into the domain type.
Entity-like Pydantic models
If a Pydantic model represents an entity payload, it often includes an id field and other mutable fields. Equality semantics are less important at the boundary; what matters is that you can reliably extract the ID and map it to your domain entity.
class CustomerPayload(BaseModel): id: str email: str name: strIn this case, treat it as a transport shape. Convert it into your domain entity or command object, and let the domain decide identity semantics.
Common Pitfalls and How to Avoid Them
Pitfall: using a mutable dataclass as a value object
If you model a value object as mutable, you can accidentally change it after it has been used as a key or shared across objects. Prefer frozen=True for value objects, and represent changes by creating new instances.
Pitfall: comparing entities by all fields
If you let dataclasses generate __eq__ for an entity, it will compare all fields. That can cause two references to the same entity (same ID) to compare unequal if one instance is stale. For entities, implement equality based on ID.
Pitfall: putting entities into sets
Even if you hash by ID, sets of entities can be confusing because membership is based on equality and hashing. Prefer sets of IDs, or keep entities in dicts keyed by ID.
Pitfall: treating IDs as primitives everywhere
Primitive IDs make it easy to mix up identifiers across concepts. Wrapping IDs in small value objects improves correctness and readability, and it makes function signatures self-documenting.
Checklist: Encoding Identity Semantics in Your Code
- Value object: immutable, equality by value, hashable, safe as dict key.
- Entity: equality by ID, usually unhashable (or hash by immutable ID only), mutable fields allowed.
- ID type: model as a value object to avoid mixing IDs.
- Collections: prefer dicts keyed by ID for entities; prefer sets/dicts of value objects for values.
- Boundary models: treat Pydantic models as transport shapes; convert to domain types with explicit semantics.